Method, apparatus and system for encoding and decoding the transform units of a coding unit

ABSTRACT

Disclosed is a method of decoding a luma transform and plurality of chroma transforms from a video bitstream. The chroma transforms contain chroma data for a single colour channel. The method determines a value of a luma transform skip flag for the luma transform indicating whether data of the luma transform is encoded in the video bitstream as a spatial domain representation. A value of a chroma transform skip flag is determined for a first chroma transform of the plurality of chroma transforms indicating whether the data of the chroma transform is encoded in the video bitstream as a spatial domain representation. The method decodes the luma transform according to the determined luma transform skip flag and the plurality of chroma transforms according to the determined chroma transform skip flag for the first chroma transform.

CROSS-REFERENCE TO RELATED APPLICATION(S)

This application is a continuation of U.S. patent application Ser. No.14/440,861, filed on May 5, 2015, that is a national phase applicationof international patent application PCT/AU2013/001117 filed on Sep. 27,2013, and the benefit under 35 U.S.C. § 119 of the filing date ofAustralian Patent Application No. 2012247040, filed Nov. 8, 2012, herebyincorporated by reference in its entirety as if fully set forth herein.That application is a divisional application of Australian PatentApplication Nos. 2012232992, filed Sep. 28, 2012, hereby incorporated byreference in its entirety as if fully set forth herein.

TECHNICAL FIELD

The present invention relates generally to digital video signalprocessing and, in particular, to a method, apparatus and system forencoding and decoding residual coefficients of a transform unit (TU),wherein the transform unit (TU) includes one or more transform units(TUs) and may be configured for multiple chroma formats, including a4:2:2 chroma format, and wherein the residual coefficients of thetransform unit (TU) may either represent data in a frequency domain or aspatial domain.

BACKGROUND

Many applications for video coding currently exist, includingapplications for transmission and storage of video data. Many videocoding standards have also been developed and others are currently indevelopment. Recent developments in video coding standardisation haveled to the formation of a group called the “Joint Collaborative Team onVideo Coding” (JCT-VC). The Joint Collaborative Team on Video Coding(JCT-VC) includes members of Study Group 16, Question 6 (SG16/Q6) of theTelecommunication Standardisation Sector (ITU-T) of the InternationalTelecommunication Union (ITU), known as the Video Coding Experts Group(VCEG), and members of the International Organisations forStandardisation/International Electrotechnical Commission JointTechnical Committee 1/Subcommittee 29/Working Group 11 (ISO/IEC JTC1/SC29/WG11), also known as the Moving Picture Experts Group (MPEG).

The Joint Collaborative Team on Video Coding (JCT-VC) has the goal ofproducing a new video coding standard to significantly outperform apresently existing video coding standard, known as “H.264/MPEG-4 AVC”.The H.264/MPEG-4 AVC standard is itself a large improvement on previousvideo coding standards, such as MPEG-4 and ITU-T H.263. The new videocoding standard under development has been named “high efficiency videocoding (HEVC)”. The Joint Collaborative Team on Video Coding JCT-VC isalso considering implementation challenges arising from technologyproposed for high efficiency video coding (HEVC) that createdifficulties when scaling implementations of the standard to operate athigh resolutions in real-time or high frame rates. One implementationchallenge is the complexity and size of logic used to support multiple‘transform’ sizes for transforming video data between the frequencydomain and the spatial domain.

SUMMARY

It is an object of the present invention to substantially overcome, orat least ameliorate, one or more disadvantages of existing arrangements.

According to one aspect of the present disclosure there is provided amethod of decoding a luma transform and plurality of chroma transformsfrom a video bitstream, the plurality of chroma transforms containingchroma data for a single colour channel, the method comprising:

determining a value of a luma transform skip flag for the lumatransform, the luma transform skip flag indicating whether data of theluma transform is encoded in the video bitstream as a spatial domainrepresentation;

determining a value of a chroma transform skip flag for a first chromatransform of the plurality of chroma transforms, the chroma transformskip flag indicating whether the data of the chroma transform is encodedin the video bitstream as a spatial domain representation; and

decoding the luma transform according to the determined value of theluma transform skip flag and the plurality of chroma transformsaccording to the determined value of the chroma transform skip flag forthe first chroma transform.

According to another aspect there is provided a method of decoding atransform unit having a luma transform and two chroma transforms from avideo bitstream, the two chroma transforms containing chroma data for asingle colour channel according to a 4:2:2 chroma format, the methodcomprising:

determining a value of a luma transform skip flag for the lumatransform, the luma transform skip flag indicating whether data of theluma transform is encoded in the video bitstream as a spatial domainrepresentation;

determining a value of a chroma transform skip flag for a first chromatransform of the two chroma transforms, the chroma transform skip flagindicating whether the data of the chroma transforms is encoded in thevideo bitstream as a spatial domain representation; and

decoding the luma transform according to the determined value of theluma transform skip flag and decoding the two chroma transformsaccording to the determined value of the chroma transform skip flag forthe first chroma transform.

According to yet another aspect there is provided a method of decoding aluma transform and plurality of chroma transforms from a videobitstream, the plurality of chroma transforms containing chroma data fora single colour channel, the method comprising:

splitting at least one rectangular one of the transforms into aplurality of square transforms; and

decoding the square transforms.

Desirably the splitting comprises splitting all rectangular transformsinto square transforms such that the decoding only operates upon squaretransforms.

According to another aspect there is provided a method of decoding atransform unit containing chroma residual coefficients from a videobitstream, the transform unit containing at least one chroma residualcoefficient array associated with a single chroma channel, the methodcomprising:

determining a size of the transform unit, the size being related to ahierarchical level of the transform unit in a corresponding coding unit;

decoding from the video bitstream the at least one chroma residualcoefficient array using a predetermined maximum number of transforms forthe chroma channel of the transform unit;

selecting an inverse transform for the decoded chroma residualcoefficient arrays, the inverse transform being selected from apredetermined set of inverse transforms; and

applying the selected inverse transform to each of the chroma residualcoefficient arrays to decode chroma residual samples for the chromachannel of the transform unit.

In yet another aspect, disclosed is a method for decoding residual datafor a region in a transform unit (TU) in a colour channel encoded in avideo bitstream, the method comprising:

first determining from the bitstream that a transform skip flag isenabled;

second determining if the region is a first region in the colour channeland in the transform unit (TU) having a coded block flag (CBF) value ofone, and if so, decoding and storing a value of the transform skip flag,otherwise retrieving the value of the transform skip flag; and

decoding the residual data of the region using the value of thetransform skip flag.

Here, preferably the first determining step further comprisesdetermining that a coding unit transform quantisation bypass flag is notenabled and the transform size is 4×4.

According to another aspect of the present disclosure, there is provideda method of inverse transforming a plurality of residual coefficientarrays from a video bitstream configured for a 4:2:2 chroma format, themethod comprising:

decoding a plurality of luma residual coefficient arrays, wherein eachluma residual coefficient array corresponds to one 4×4 luma block of aplurality of 4×4 luma blocks, each 4×4 luma block being collocated withone 4×4 transform unit of a plurality of 4×4 transform units, aplurality of 4×4 luma blocks collectively occupying an 8×8 luma region;

decoding, after the luma residual coefficient arrays are decoded, aplurality of chroma residual coefficient arrays for a first colourchannel, wherein each chroma residual coefficient array corresponds to a4×4 chroma block and each 4×4 chroma block for the first colour channelis collocated with two of the plurality of 4×4 transform units;

decoding, after the chroma residual coefficient arrays for the firstcolour channel are decoded, a plurality of chroma residual coefficientarrays for a second colour channel, wherein each chroma residualcoefficient array corresponds to a 4×4 chroma block and each chromablock for the second colour channel is collocated with two of theplurality of 4×4 transform units; and

applying an inverse transform to each of the decoded plurality of lumaresidual coefficient arrays, the decoded plurality of chroma residualcoefficient arrays for the first colour channel and the decodedplurality of chroma residual coefficient arrays for the second colourchannel.

Preferably, the number of luma residual coefficient arrays in theplurality of luma residual coefficient arrays is four. Desirably,wherein the number of chroma residual coefficient arrays in theplurality of chroma residual coefficient arrays is two. Advantageouslyone residual coefficient array includes all coefficients necessary forinverse transforming one 4×4 block.

According to another aspect, disclosed is a method of forwardtransforming a plurality of residual coefficient arrays into a videobitstream configured for a 4:2:2 chroma format, the method comprising:

applying a forward transform to each of a plurality of luma residualcoefficient arrays, a plurality of chroma residual coefficient arraysfor a first colour channel and a plurality of chroma residualcoefficient arrays for a second colour channel;

encoding the plurality of luma residual coefficient arrays, wherein eachluma residual coefficient array corresponds to one 4×4 luma block of aplurality of 4×4 luma blocks, each 4×4 luma block being collocated withone 4×4 transform unit of a plurality of 4×4 transform units, aplurality of 4×4 luma blocks collectively occupying an 8×8 luma region;

encoding, after the luma residual coefficient arrays are encoded, theplurality of chroma residual coefficient arrays for the first colourchannel, wherein each chroma residual coefficient array corresponds to a4×4 chroma block and each 4×4 chroma block for the first colour channelis collocated with two of the plurality of 4×4 transform units; and

encoding, after the chroma residual coefficient arrays for the firstcolour channel are encoded, the plurality of chroma residual coefficientarrays for the second colour channel, wherein each chroma residualcoefficient array corresponds to a 4×4 chroma block and each chromablock for the second colour channel is collocated with two of theplurality of 4×4 transform units.

Other aspects, including complementary encoders, are also disclosed.

BRIEF DESCRIPTION OF THE DRAWINGS

At least one embodiment of the present invention will now be describedwith reference to the following drawings, in which:

FIG. 1 is a schematic block diagram showing a video encoding anddecoding system;

FIGS. 2A and 2B form a schematic block diagram of a general purposecomputer system upon which one or both of the video encoding anddecoding system of FIG. 1 may be practiced;

FIG. 3 is a schematic block diagram showing functional modules of avideo encoder;

FIG. 4 is a schematic block diagram showing functional modules of avideo decoder;

FIGS. 5A and 5B schematically illustrate chroma formats for representingframe data;

FIG. 6A is a schematic representation of an exemplary transform tree ofa coding unit;

FIG. 6B is a schematic representation of the exemplary transform treearranged on a luma sample grid;

FIG. 6C is a schematic representation of the exemplary transform treearranged on a chroma sample grid;

FIG. 7 is a schematic illustration of a data structure representing aluma channel of the exemplary transform tree;

FIG. 8 illustrates a data structure representing a chroma channel of theexemplary transform tree;

FIGS. 9A and 9B schematically show a bitstream structure that encodesthe exemplary transform tree;

FIGS. 9C, 9D and 9E schematically show an alternative bitstreamstructure that encodes the exemplary transform tree;

FIG. 10 is a schematic flow diagram showing a method for encoding theexemplary transform tree;

FIG. 11 is a schematic flow diagram showing a method for decoding theexemplary transform tree;

FIGS. 12A to 12C schematically show residual scan patterns of a 4×8transform unit;

FIG. 13 is a schematic flow diagram showing a method for encoding theexemplary transform unit;

FIG. 14 is a schematic flow diagram showing a method for decoding theexemplary transform unit;

FIG. 15 schematically shows possible arrangements of 4×4 transforms for4×4 and 8×8 transform units (TUs);

FIG. 16 schematically illustrates exemplary chroma regions for animplementation;

FIG. 17 is a schematic flow diagram showing a method for decodingresidual data of the exemplary transform unit; and

FIG. 18 schematically illustrates a transform skip operation applied toa 4×8 chroma region with a 4×8 (non-square) transform.

DETAILED DESCRIPTION INCLUDING BEST MODE

Where reference is made in any one or more of the accompanying drawingsto steps and/or features, which have the same reference numerals, thosesteps and/or features have for the purposes of this description the samefunction(s) or operation(s), unless the contrary intention appears.

FIG. 1 is a schematic block diagram showing function modules of a videoencoding and decoding system 100 that may utilise techniques for codingsyntax elements representative of inferred subdivision of transformunits into multiple transforms for a chroma channel. The system 100includes a source device 110 and a destination device 130. Acommunication channel 120 is used to communicate encoded videoinformation from the source device 110 to the destination device 130. Insome cases, the source device 110 and destination device 130 maycomprise respective mobile telephone hand-sets, in which case thecommunication channel 120 is a wireless channel. In other cases, thesource device 110 and destination device 130 may comprise videoconferencing equipment, in which case the communication channel 120 istypically a wired channel, such as an internet connection. Moreover, thesource device 110 and the destination device 130 may comprise any of awide range of devices, including devices supporting over the airtelevision broadcasts, cable television applications, internet videoapplications and including applications where the encoded video iscaptured on some storage medium or a file server.

As illustrated, the source device 110 includes a video source 112, avideo encoder 114 and a transmitter 116. The video source 112 typicallycomprises a source of captured video frame data, such as an imagingsensor, a previously captured video sequence stored on a non-transitoryrecording medium, or a video feed from a remote imaging sensor. Examplesof source devices 110 that may include an imaging sensor as the videosource 112 include smart-phones, video camcorders and network videocameras. The video encoder 114 converts the captured frame data from thevideo source 112 into encoded video data and will be described furtherwith reference to FIG. 3. The encoded video data is typicallytransmitted by the transmitter 116 over the communication channel 120 asencoded video information. It is also possible for the encoded videodata to be stored in some storage device, such as a “Flash” memory or ahard disk drive, until later being transmitted over the communicationchannel 120.

The destination device 130 includes a receiver 132, a video decoder 134and a display device 136. The receiver 132 receives encoded videoinformation from the communication channel 120 and passes received videodata to the video decoder 134. The video decoder 134 then outputsdecoded frame data to the display device 136. Examples of the displaydevice 136 include a cathode ray tube, a liquid crystal display, such asin smart-phones, tablet computers, computer monitors or in stand-alonetelevision sets. It is also possible for the functionality of each ofthe source device 110 and the destination device 130 to be embodied in asingle device.

Notwithstanding the exemplary devices mentioned above, each of thesource device 110 and destination device 130 may be configured within ageneral purpose computing system, typically through a combination ofhardware and software components. FIG. 2A illustrates such a computersystem 200, which includes: a computer module 201; input devices such asa keyboard 202, a mouse pointer device 203, a scanner 226, a camera 227,which may be configured as the video source 112, and a microphone 280;and output devices including a printer 215, a display device 214, whichmay be configured as the display device 136, and loudspeakers 217. Anexternal Modulator-Demodulator (Modem) transceiver device 216 may beused by the computer module 201 for communicating to and from acommunications network 220 via a connection 221. The communicationsnetwork 220, which may represent the communication channel 120, may be awide-area network (WAN), such as the Internet, a cellulartelecommunications network, or a private WAN. Where the connection 221is a telephone line, the modem 216 may be a traditional “dial-up” modem.Alternatively, where the connection 221 is a high capacity (e.g., cable)connection, the modem 216 may be a broadband modem. A wireless modem mayalso be used for wireless connection to the communications network 220.The transceiver device 216 may provide the functionality of thetransmitter 116 and the receiver 132 and the communication channel 120may be embodied in the connection 221.

The computer module 201 typically includes at least one processor unit205, and a memory unit 206. For example, the memory unit 206 may havesemiconductor random access memory (RAM) and semiconductor read onlymemory (ROM). The computer module 201 also includes an number ofinput/output (I/O) interfaces including: an audio-video interface 207that couples to the video display 214, loudspeakers 217 and microphone280; an I/O interface 213 that couples to the keyboard 202, mouse 203,scanner 226, camera 227 and optionally a joystick or other humaninterface device (not illustrated); and an interface 208 for theexternal modem 216 and printer 215. In some implementations, the modem216 may be incorporated within the computer module 201, for examplewithin the interface 208. The computer module 201 also has a localnetwork interface 211, which permits coupling of the computer system 200via a connection 223 to a local-area communications network 222, knownas a Local Area Network (LAN). As illustrated in FIG. 2A, the localcommunications network 222 may also couple to the wide network 220 via aconnection 224, which would typically include a so-called “firewall”device or device of similar functionality. The local network interface211 may comprise an Ethernet™ circuit card, a Bluetooth™ wirelessarrangement or an IEEE 802.11 wireless arrangement; however, numerousother types of interfaces may be practiced for the interface 211. Thelocal network interface 211 may also provide the functionality of thetransmitter 116 and the receiver 132 and communication channel 120 mayalso be embodied in the local communications network 222.

The I/O interfaces 208 and 213 may afford either or both of serial andparallel connectivity, the former typically being implemented accordingto the Universal Serial Bus (USB) standards and having corresponding USBconnectors (not illustrated). Storage devices 209 are provided andtypically include a hard disk drive (HDD) 210. Other storage devicessuch as a floppy disk drive and a magnetic tape drive (not illustrated)may also be used. An optical disk drive 212 is typically provided to actas a non-volatile source of data. Portable memory devices, such opticaldisks (e.g. CD-ROM, DVD, Blu-ray Disc™), USB-RAM, portable, externalhard drives, and floppy disks, for example, may be used as appropriatesources of data to the computer system 200. Typically, any of the HDD210, optical drive 212, networks 220 and 222 may also be configured tooperate as the video source 112, or as a destination for decoded videodata to be stored for reproduction via the display 214.

The components 205 to 213 of the computer module 201 typicallycommunicate via an interconnected bus 204 and in a manner that resultsin a conventional mode of operation of the computer system 200 known tothose in the relevant art. For example, the processor 205 is coupled tothe system bus 204 using a connection 218. Likewise, the memory 206 andoptical disk drive 212 are coupled to the system bus 204 by connections219. Examples of computers on which the described arrangements can bepractised include IBM-PC's and compatibles, Sun SPARCstations, AppleMac™ or alike computer systems.

Where appropriate or desired, the video encoder 114 and the videodecoder 134, as well as methods described below, may be implementedusing the computer system 200 wherein the video encoder 114, the videodecoder 134 and the processes of FIGS. 10 to 13, to be described, may beimplemented as one or more software application programs 233 executablewithin the computer system 200. In particular, the video encoder 114,the video decoder 134 and the steps of the described methods areeffected by instructions 231 (see FIG. 2B) in the software 233 that arecarried out within the computer system 200. The software instructions231 may be formed as one or more code modules, each for performing oneor more particular tasks. The software may also be divided into twoseparate parts, in which a first part and the corresponding code modulesperforms the described methods and a second part and the correspondingcode modules manage a user interface between the first part and theuser.

The software may be stored in a computer readable medium, including thestorage devices described below, for example. The software is loadedinto the computer system 200 from the computer readable medium, and thenexecuted by the computer system 200. A computer readable medium havingsuch software or computer program recorded on the computer readablemedium is a computer program product. The use of the computer programproduct in the computer system 200 preferably effects an advantageousapparatus for implementing the video encoder 114, the video decoder 134and the described methods.

The software 233 is typically stored in the HDD 210 or the memory 206.The software is loaded into the computer system 200 from a computerreadable medium, and executed by the computer system 200. Thus, forexample, the software 233 may be stored on an optically readable diskstorage medium (e.g., CD-ROM) 225 that is read by the optical disk drive212.

In some instances, the application programs 233 may be supplied to theuser encoded on one or more CD-ROMs 225 and read via the correspondingdrive 212, or alternatively may be read by the user from the networks220 or 222. Still further, the software can also be loaded into thecomputer system 200 from other computer readable media. Computerreadable storage media refers to any non-transitory tangible storagemedium that provides recorded instructions and/or data to the computersystem 200 for execution and/or processing. Examples of such storagemedia include floppy disks, magnetic tape, CD-ROM, DVD, Blu-ray Disc, ahard disk drive, a ROM or integrated circuit, USB memory, amagneto-optical disk, or a computer readable card such as a PCMCIA cardand the like, whether or not such devices are internal or external ofthe computer module 201. Examples of transitory or non-tangible computerreadable transmission media that may also participate in the provisionof the software, application programs, instructions and/or video data orencoded video data to the computer module 401 include radio or infra-redtransmission channels as well as a network connection to anothercomputer or networked device, and the Internet or Intranets includinge-mail transmissions and information recorded on Websites and the like.

The second part of the application programs 233 and the correspondingcode modules mentioned above may be executed to implement one or moregraphical user interfaces (GUIs) to be rendered or otherwise representedupon the display 214. Through manipulation of typically the keyboard 202and the mouse 203, a user of the computer system 200 and the applicationmay manipulate the interface in a functionally adaptable manner toprovide controlling commands and/or input to the applications associatedwith the GUI(s). Other forms of functionally adaptable user interfacesmay also be implemented, such as an audio interface utilizing speechprompts output via the loudspeakers 217 and user voice commands inputvia the microphone 280.

FIG. 2B is a detailed schematic block diagram of the processor 205 and a“memory” 234. The memory 234 represents a logical aggregation of all thememory modules (including the HDD 209 and semiconductor memory 206) thatcan be accessed by the computer module 201 in FIG. 2A.

When the computer module 201 is initially powered up, a power-onself-test (POST) program 250 executes. The POST program 250 is typicallystored in a ROM 249 of the semiconductor memory 206 of FIG. 2A. Ahardware device such as the ROM 249 storing software is sometimesreferred to as firmware. The POST program 250 examines hardware withinthe computer module 201 to ensure proper functioning and typicallychecks the processor 205, the memory 234 (209, 206), and a basicinput-output systems software (BIOS) module 251, also typically storedin the ROM 249, for correct operation. Once the POST program 250 has runsuccessfully, the BIOS 251 activates the hard disk drive 210 of FIG. 2A.Activation of the hard disk drive 210 causes a bootstrap loader program252 that is resident on the hard disk drive 210 to execute via theprocessor 205. This loads an operating system 253 into the RAM memory206, upon which the operating system 253 commences operation. Theoperating system 253 is a system level application, executable by theprocessor 205, to fulfill various high level functions, includingprocessor management, memory management, device management, storagemanagement, software application interface, and generic user interface.

The operating system 253 manages the memory 234 (209, 206) to ensurethat each process or application running on the computer module 201 hassufficient memory in which to execute without colliding with memoryallocated to another process. Furthermore, the different types of memoryavailable in the computer system 200 of FIG. 2A must be used properly sothat each process can run effectively. Accordingly, the aggregatedmemory 234 is not intended to illustrate how particular segments ofmemory are allocated (unless otherwise stated), but rather to provide ageneral view of the memory accessible by the computer system 200 and howsuch is used.

As shown in FIG. 2B, the processor 205 includes a number of functionalmodules including a control unit 239, an arithmetic logic unit (ALU)240, and a local or internal memory 248, sometimes called a cachememory. The cache memory 248 typically includes a number of storageregisters 244-246 in a register section. One or more internal busses 241functionally interconnect these functional modules. The processor 205typically also has one or more interfaces 242 for communicating withexternal devices via the system bus 204, using a connection 218. Thememory 234 is coupled to the bus 204 using a connection 219.

The application program 233 includes a sequence of instructions 231 thatmay include conditional branch and loop instructions. The program 233may also include data 232 which is used in execution of the program 233.The instructions 231 and the data 232 are stored in memory locations228, 229, 230 and 235, 236, 237, respectively. Depending upon therelative size of the instructions 231 and the memory locations 228-230,a particular instruction may be stored in a single memory location asdepicted by the instruction shown in the memory location 230.Alternately, an instruction may be segmented into a number of parts eachof which is stored in a separate memory location, as depicted by theinstruction segments shown in the memory locations 228 and 229.

In general, the processor 205 is given a set of instructions which areexecuted therein. The processor 205 waits for a subsequent input, towhich the processor 205 reacts to by executing another set ofinstructions. Each input may be provided from one or more of a number ofsources, including data generated by one or more of the input devices202, 203, data received from an external source across one of thenetworks 220, 202, data retrieved from one of the storage devices 206,209 or data retrieved from a storage medium 225 inserted into thecorresponding reader 212, all depicted in FIG. 2A. The execution of aset of the instructions may in some cases result in output of data.Execution may also involve storing data or variables to the memory 234.

The video encoder 114, the video decoder 134 and the described methodsmay use input variables 254, which are stored in the memory 234 incorresponding memory locations 255, 256, 257. The video encoder 114, thevideo decoder 134 and the described methods produce output variables261, which are stored in the memory 234 in corresponding memorylocations 262, 263, 264. Intermediate variables 258 may be stored inmemory locations 259, 260, 266 and 267.

Referring to the processor 205 of FIG. 2B, the registers 244, 245, 246,the arithmetic logic unit (ALU) 240, and the control unit 239 worktogether to perform sequences of micro-operations needed to perform“fetch, decode, and execute” cycles for every instruction in theinstruction set making up the program 233. Each fetch, decode, andexecute cycle comprises:

(a) a fetch operation, which fetches or reads an instruction 231 from amemory location 228, 229, 230;

(b) a decode operation in which the control unit 239 determines whichinstruction has been fetched; and

(c) an execute operation in which the control unit 239 and/or the ALU240 execute the instruction.

Thereafter, a further fetch, decode, and execute cycle for the nextinstruction may be executed. Similarly, a store cycle may be performedby which the control unit 239 stores or writes a value to a memorylocation 232.

Each step or sub-process in the processes of FIGS. 10 to 13 to bedescribed is associated with one or more segments of the program 233 andis typically performed by the register section 244, 245, 247, the ALU240, and the control unit 239 in the processor 205 working together toperform the fetch, decode, and execute cycles for every instruction inthe instruction set for the noted segments of the program 233.

FIG. 3 is a schematic block diagram showing functional modules of thevideo encoder 114. FIG. 4 is a schematic block diagram showingfunctional modules of the video decoder 134. The video encoder 114 andvideo decoder 134 may be implemented using a general-purpose computersystem 200, as shown in FIGS. 2A and 2B, where the various functionalmodules may be implemented by dedicated hardware within the computersystem 200, by software executable within the computer system 200 suchas one or more software code modules of the software application program233 resident on the hard disk drive 205 and being controlled in itsexecution by the processor 205, or alternatively by a combination ofdedicated hardware and software executable within the computer system200. The video encoder 114, the video decoder 134 and the describedmethods may alternatively be implemented in dedicated hardware, such asone or more integrated circuits performing the functions or subfunctions of the described methods. Such dedicated hardware may includegraphic processors, digital signal processors, application specificintegrated circuits (ASICs), field programmable gate arrays (FPGAs) orone or more microprocessors and associated memories. In particular thevideo encoder 114 comprises modules 320-344 and the video decoder 134comprises modules 420-434 which may each be implemented as one or moresoftware code modules of the software application program 233.

Although the video encoder 114 of FIG. 3 is an example of a highefficiency video coding (HEVC) video encoding pipeline, processingstages performed by the modules 320-344 are common to other video codecssuch as VC-1 or H.264/MPEG-4 AVC. The video encoder 114 receivescaptured frame data, such as captured frame data, as a series of frames,each frame including one or more colour channels. Each frame comprisesone sample grid per colour channel. Colour information is representedusing a ‘colour space’, such as recommendation ITU-R BT.709 (‘YUV’),although other colour spaces are also possible. When the YUV colourspace is used, the colour channels include a luma channel (‘Y’) and twochroma channels (‘U’ and ‘V’). Moreover, differing amounts ofinformation may be included in the sample grid of each colour channel,depending on the sampling of the image or through application offiltering to resample the captured frame data. Several samplingapproaches, known as ‘chroma formats’ exist, some of which will bedescribed with reference to FIGS. 5A and 5B.

The video encoder 114 divides each frame of the captured frame data,such as frame data 310, into regions generally referred to as ‘codingtree blocks’ (CTBs). Each coding tree block (CTB) includes ahierarchical quad-tree subdivision of a portion of the frame into acollection of ‘coding units’ (CUs). The coding tree block (CTB)generally occupies an area of 64×64 luma samples, although other sizesare possible, such as 16×16 or 32×32. In some cases even larger sizes,such as 128×128, may be used. The coding tree block (CTB) may besub-divided via a split into four equal sized regions to create a newhierarchy level. Splitting may be applied recursively, resulting in aquad-tree hierarchy. As the coding tree block (CTB) side dimensions arealways powers of two and the quad-tree splitting always results in ahalving of the width and height, the region side dimensions are alsoalways powers of two. When no further split of a region performed, a‘coding unit’ (CU) is said to exist within the region. When no split isperformed at the top level of the coding tree block, the regionoccupying the entire coding tree block contains one coding unit (CU)that is generally referred to as a ‘largest coding unit’ (LCU). Aminimum size also exists for each coding unit, such as the area occupiedby 8×8 luma samples, although other minimum sizes are also possible.Coding units of this size are generally referred to as ‘smallest codingunits’ (SCUs). As a result of this quad-tree hierarchy, the entirety ofthe coding tree block (CTB) is occupied by one or more coding units(CUs).

The video encoder 114 produces one or more arrays of samples, generallyreferred to as ‘prediction units’ (PUs) for each coding unit (CU).Various arrangements of prediction units (PUs) in each coding unit (CU)are possible, with a requirement that the prediction units (PUs) do notoverlap and that the entirety of the coding unit (CU) is occupied by theone or more prediction units (PUs). This scheme ensures that theprediction units (PUs) cover the entire frame area.

The video encoder 114 operates by outputting, from a multiplexer module340, a prediction unit (PU) 382. A difference module 344 outputs thedifference between the prediction unit (PU) 382 and a corresponding 2Darray of data samples, in the spatial domain, from a coding unit (CU) ofthe coding tree block (CTB) of the frame data 310, the difference beingknown as a ‘residual sample array’ 360. The residual sample array 360may be transformed into the frequency domain in a transform module 320,or the residual sample array 360 may remain in the spatial domain, witha selection between the two being performed by a multiplexer 321,operating under the control of a transform skip control module 346 andsignalled using a transform skip flag 386. The transform skip controlmodule 346 determines the transform skip flag 386, which indicateswhether the transform module 320 is used to transform the residualsample array 360 into a residual coefficient array 362, or whether useof the transform module 320 is skipped. Skipping the transform module320 is referred to as a ‘transform skip’. When the transform is notskipped, the residual sample array 360 from the difference module 344 isreceived by the transform module 320, which converts (or ‘encodes’) theresidual sample array 360 from a spatial representation to a frequencydomain representation by applying a ‘forward transform’. The transformmodule 320 creates transform coefficients configured as the residualtransform array 362 for each transform in a transform unit (TU) in ahierarchical sub-division of the coding unit (CU) into one or moretransform units (TUs) generally referred to as a ‘transform tree’. Whena transform skip is performed, the residual sample array 360 isrepresented in the encoded bitstream 312 in the spatial domain and thetransform module 320 is bypassed, resulting in the residual sample array360 being passed directly to a scale and quantise module 322 via themultiplexer 321, which operates under control of the transform skip flag386. The transform skip control module 346 may test the bit-raterequired in the encoded bitstream 312 for each value of the transformskip flag 386 (i.e. transform skipped, or normal transform operation).The transform skip control module 346 may select a value for thetransform skip flag 386 that results in lower bit-rate in the encodedbitstream 312, thus achieving higher compression efficiency. Each testperformed by the transform skip control module 346 increases complexityof the video encoder 114, and thus it is desirable to reduce the numberof cases for which the transform skip module 346 performs the test tothose where the benefit of selecting a transform skip outweighs the costof performing test. For example, this may be achieved by restricting thetransform skip to specific transform sizes and block types, such as only4×4 transforms for intra-predicted blocks (as described further below)in the high efficiency video coding (HEVC) standard under development.The transform skip functionality is especially useful for encodingresidual sample arrays 360 that contain much ‘high frequency’information. High frequency information is typically present in framedata 310 containing many sharp edges, such as where alphanumericcharacters are embedded in the frame data 310. Other sources of framedata 310, such as computer generated graphics, may also contain muchhigh frequency information. The DCT-like transform of the transformmodule 320 is optimised for frame data 310 containing mostly lowfrequency information, such as that obtained from an imaging sensorcapturing a natural image. The presence of the transform skipfunctionality thus provides considerable coding efficiency gain forapplications, which are relevant for the high efficiency video coding(HEVC) standard under development. For the video encoder 114, onedrawback of supporting the transform skip functionality is the need totest two possible modes for the transform skip flag 386. As discussedbelow, the transform skip functionality is supported for a residualsample array 360 size of 4×4 samples and when the residual sample array360 corresponds to an intra-predicted block, as described with referenceto an intra-frame prediction module 336. However the transform skip flag386 is desirably separately signalled for each colour channel and thus aseparate test may be performed by the transform skip control module 346for each colour channel. Separate signalling for each colour channel isadvantageous because the high frequency information may be concentratedin one or both chroma channels, thus being suited to transform skip,while the luma channel may have minimal high frequency information andthus benefit from using a transform. For example, coloured text on acoloured background would result in this scenario.

For the high efficiency video coding (HEVC) standard under development,the conversion to the frequency domain representation is implementedusing a modified discrete cosine transform (DCT), in which a traditionalDCT is modified to be implemented using shifts and additions. Varioussizes for the residual sample array 360 and the transform coefficients362 are possible, in accordance with the supported transform sizes. Inthe high efficiency video coding (HEVC) standard under development,transforms are performed on 2D arrays of samples having specific sizes,such as 32×32, 16×16, 8×8 and 4×4. A predetermined set of transformsizes available to a video encoder 114 may thus be said to exist.Moreover, as foreshadowed above, the set of transform sizes may differbetween the luma channel and the chroma channels. Two-dimensionaltransforms are generally configured to be ‘separable’, enablingimplementation as a first set of 1D transforms operating on the 2D arrayof samples in one direction (e.g. on rows), followed by a second set of1D transform operating on the 2D array of samples output from the firstset of 1D transforms in the other direction (e.g. on columns).Transforms having the same width and height are generally referred to as‘square transforms’. Additional transforms, having differing widths andheights are also possible and are generally referred to as ‘non-squaretransforms’. Optimised implementations of the transforms may combine therow and column one-dimensional transforms into specific hardware orsoftware modules, such as a 4×4 transform module or an 8×8 transformmodule. Transforms having larger dimensions require larger amounts ofcircuitry to implement, even though they may be infrequently used.Accordingly, a maximum transform size of 32×32 exists in the highefficiency video coding (HEVC) standard under development. Theintegrated nature of transform implementation also introduces apreference to reduce the number of non-square transform sizes supported,as these will typically require entirely new hardware to be implemented,instead of reusing existing one-dimensional transform logic present fromcorresponding square transforms.

Transforms are applied to both the luma and chroma channels. Differencesbetween the handling of luma and chroma channels with regard totransform units (TUs) exist and will be discussed below with referenceto FIGS. 5A and 5B. Each transform tree occupies one coding unit (CU)and is defined as a quad-tree decomposition of the coding unit (CU) intoa hierarchy containing one transform unit (TU) at each leaf node of thetransform tree (quad-tree) hierarchy, with each transform unit (TU) ableto make use of transforms of the supported transform sizes. Similarly tothe coding tree block (CTB), it is necessary for the entirety of thecoding unit (CU) to be occupied by one or more transform units (TUs). Ateach level of the transform tree quad-tree hierarchy a ‘coded block flagvalue’ signals the possible presence of a transform in each colourchannel, either in the present hierarchy level when no further splitsare present, or to signal that lower hierarchy levels may contain atleast one transform among the resulting transform units (TUs). When thecoded block flag value is zero, no transform is performed for thecorresponding colour channel of any transform units (TU) of thetransform tree, either at the present hierarchical level or at lowerhierarchical levels. When the coded block flag value is one, the regioncontains a transform which must have at least one non-zero residualcoefficient. In this manner, for each colour channel, zero or moretransforms may cover a portion of the area of the coding unit (CU)varying from none up to the entirety of the coding unit (CU). Separatecoded block flag values exist for each colour channel. Each coded blockflag value is not required to be encoded, as cases exist where there isonly one possible coded block flag value.

The output of the multiplexer 321 is thus one of the residual samplearray 360 or the transform coefficient array 362, and is labelled simplyas an array 363 in FIG. 3. The array 363 is input to the scale andquantise module 322 where the sample values thereof are scaled andquantised according to a determined quantisation parameter 384 toproduce a residual data array 364. The scale and quantisation processresults in a loss of precision, dependent on the value of the determinedquantisation parameter 384. A higher value of the determinedquantisation parameter 384 results in greater information being lostfrom the residual data. This increases the compression achieved by thevideo encoder 114 at the expense of reducing the visual quality of theoutput from the video decoder 134. The determined quantisation parameter384 may be adapted during encoding of each frame of the frame data 310,or it may be fixed for a portion of the frame data 310, such as anentire frame. Other adaptations of the determined quantisation parameter384 are also possible, such as quantising different residualcoefficients with separate values. The residual data array 364 anddetermined quantisation parameter 384 are taken as input to an inversescaling module 326 which reverses the scaling performed by the scale andquantise module 322 to produce rescaled data arrays 366, which arerescaled versions of the residual data array 364. The high efficiencyvideo coding standard (HEVC) standard under development also supports a‘lossless’ coding mode. When lossless coding is in use, the transformmodule 320 and the scale and quantise module 322 are both bypassed,resulting in the residual sample array 360 being input directly to theentropy encoder 324. In lossless mode, the inverse scaling module 326and the inverse transform module 328 are also bypassed. The selection oflossless coding mode (as opposed to the usual ‘lossy’ mode) is encodedin the encoded bitstream 312 by the entropy encoder 324. Logic toimplement the bypass for lossless mode is not illustrated in FIG. 3.Bypassing the scale and quantise module 322 results in no quantisationof the residual coefficient array 362 or residual sample array 360, andan exact representation of the frame data 310 is encoded in the encodedbitstream 312 by the entropy encoder 324. The lossless coding moderesults in low compression efficiency of the video encoder 114 andtherefore is generally used only in applications where lossless codingis highly desirable, such as in medical applications.

The residual data array 364, the determined quantisation parameter 384and the transform skip flag 386 are also taken as input to an entropyencoder module 324 which encodes the values of the residual data array364 in an encoded bitstream 312 (or ‘video bitstream’). The residualdata array 364 in each transform unit (TU) are encoded in groupsgenerally known as ‘sub-blocks’. Sub-blocks should preferably have thesame dimensions regardless of the size of the transform, as this permitsreuse of logic relating to sub-block processing. The residual datawithin one sub-block are generally referred to as a ‘data group’ (or a‘coefficient group’, even when the transform skip is applied and the‘coefficient group’ includes a spatial domain representation rather thana frequency domain representation) and for each data group, a data groupflag is generally encoded to indicate if at least one residual datavalue within the data group is non-zero. In some cases the data groupflag may be inferred and thus is not encoded. A flag is encoded for eachresidual data value belonging to a data group having a data group flagvalue of one to indicate if the residual data value is non-zero(‘significant’) or zero (‘non-significant’). Due to the loss ofprecision resulting from the scale and quantise module 322, the rescaleddata arrays 366 are not identical to the original values in the array363. The rescaled data arrays 366 from the inverse scaling module 326are then output to an inverse transform module 328. The inversetransform module 328 performs an inverse transform from the frequencydomain to the spatial domain to produce a spatial-domain representation368 of the rescaled transform coefficient arrays 366 identical to aspatial domain representation that is produced at the video decoder 134.A multiplexer 369 is configured to complement the operation of themultiplexer 321. The multiplexer 369 is configured to receive each ofthe rescaled data arrays 366 and the (transformed) spatial-domainrepresentation 368 as inputs and, under control of the transform skipflag 386, select one of the inputs 366 or 368 as an input to a summationmodule 342.

A motion estimation module 338 produces motion vectors 374 by comparingthe frame data 310 with previous frame data from one or more sets offrames stored in a frame buffer module 332, generally configured withinthe memory 206. The sets of frames are known as ‘reference picturelists’. The motion vectors 374 are then input to a motion compensationmodule 334 which produces an inter-predicted prediction unit (PU) 376 byfiltering samples stored in the frame buffer module 332, taking intoaccount a spatial offset derived from the motion vectors 374. Notillustrated in FIG. 3, the motion vectors 374 are also passed as syntaxelements to the entropy encoder module 324 for encoding in the encodedbitstream 312. The intra-frame prediction module 336 produces anintra-predicted prediction unit (PU) 378 using samples 370 obtained fromthe summation module 342, which sums the prediction unit (PU) 382 fromthe multiplexer module 340 and the spatial domain output of themultiplexer 369. The intra-frame prediction module 336 also produces anintra-prediction mode 380 which is sent to the entropy encoder 324 forencoding into the encoded bitstream 312.

Prediction units (PUs) may be generated using either an intra-predictionor an inter-prediction method. Intra-prediction methods make use ofsamples adjacent to the prediction unit (PU) that have previously beendecoded (typically above and to the left of the prediction unit) inorder to generate reference samples within the prediction unit (PU).Various directions of intra-prediction are possible, referred to as the‘intra-prediction mode’. Inter-prediction methods make use of a motionvector to refer to a block from a selected reference frame. As the blockmay have any alignment down to a sub-sample precision, e.g. one eighthof a sample, filtering is necessary to create a block of referencesamples for the prediction unit (PU). The decision on which method touse is made according to a rate-distortion trade-off between desiredbit-rate of the resulting encoded bitstream 312 and the amount of imagequality distortion introduced by either the intra-prediction orinter-prediction method. If intra-prediction is used, oneintra-prediction mode is selected from the set of intra-predictionpossible modes, also according to a rate-distortion trade-off. Themultiplexer module 340 selects either the intra-predicted referencesamples 378 from the intra-frame prediction module 336, or theinter-predicted prediction unit (PU) 376 from the motion compensationblock 334, depending on the decision made by the rate distortionalgorithm. The summation module 342 produces a sum 370 that is input toa deblocking filter module 330. The deblocking filter module 330performs filtering along block boundaries, producing deblocked samples372 that are written to the frame buffer module 332 configured withinthe memory 206. The frame buffer module 332 is a buffer with sufficientcapacity to hold data from one or more past frames for future referenceas part of a reference picture list.

For the high efficiency video coding (HEVC) standard under development,the encoded bitstream 312 produced by the entropy encoder 324 isdelineated into network abstraction layer (NAL) units. Generally, eachslice of a frame is contained in one NAL unit. The entropy encoder 324encodes the residual array 364, the intra-prediction mode 380, themotion vectors and other parameters, collectively referred to as ‘syntaxelements’, into the encoded bitstream 312 by performing a contextadaptive binary arithmetic coding (CABAC) algorithm. Syntax elements aregrouped together into ‘syntax structures’, these groupings may containrecursion to describe hierarchical structures. In addition to ordinalvalues, such as an intra-prediction mode or integer values, such as amotion vector, syntax elements also include flags, such as to indicate aquad-tree split. The motion estimation module 338 and motioncompensation module 334 operate on motion vectors 374, having aprecision of ⅛ of a luma sample, enabling precise modelling of motionbetween frames in the frame data 310.

Although the video decoder 134 of FIG. 4 is described with reference toa high efficiency video coding (HEVC) video decoding pipeline,processing stages performed by the modules 420-434 are common to othervideo codecs that employ entropy coding, such as H.264/MPEG-4 AVC,MPEG-2 and VC-1. The encoded video information may also be read frommemory 206, the hard disk drive 210, a CD-ROM, a Blu-Ray™ disk or othercomputer readable storage medium. Alternatively the encoded videoinformation may be received from an external source such as a serverconnected to the communications network 220 or a radio-frequencyreceiver.

As seen in FIG. 4, received video data, such as the encoded bitstream312, is input to the video decoder 134. The encoded bitstream 312 may beread from memory 206, the hard disk drive 210, a CD-ROM, a Blu-Ray™ diskor other computer readable storage medium. Alternatively the encodedbitstream 312 may be received from an external source such as a serverconnected to the communications network 220 or a radio-frequencyreceiver. The encoded bitstream 312 contains encoded syntax elementsrepresenting the captured frame data to be decoded.

The encoded bitstream 312 is input to an entropy decoder module 420which extracts the syntax elements from the encoded bitstream 312 andpasses the values of the syntax elements to other blocks in the videodecoder 134. The entropy decoder module 420 applies the context adaptivebinary arithmetic coding (CABAC) algorithm to decode syntax elementsfrom the encoded bitstream 312. The decoded syntax elements are used toreconstruct parameters within the video decoder 134. Parameters includezero or more residual data array 450, motion vectors 452, a predictionmode 454 and a transform skip flag 468. The residual data array 450 ispassed to an inverse scale module 421, the motion vectors 452 are passedto a motion compensation module 434, and the prediction mode 454 ispassed to an intra-frame prediction module 426 and to a multiplexer 428.The inverse scale module 421 performs inverse scaling on the residualdata to create reconstructed data 455. When the transform skip flag 468is zero, the inverse scale module 421 outputs the reconstructed data 455to an inverse transform module 422. The inverse transform module 422applies an ‘inverse transform’ to convert (or ‘decode’) thereconstructed data, which in this case are transform coefficients, froma frequency domain representation to a spatial domain representation,outputting a residual sample array 456 via a multiplexer module 423.When the value of the transform skip flag 468 is one, the reconstructeddata 455, which in this case is in the spatial domain, are output as theresidual sample array 456 via the multiplexer module 423. The inversetransform module 422 performs the same operation as the inversetransform 328. The inverse transform module 422 must therefore beconfigured to provide a predetermined set of transform sizes required todecode an encoded bitstream 312 that is compliant with the highefficiency video coding (HEVC) standard under development. Whensignalling in the encoded bitstream 312 indicates that the lossless modewas used, the video decoder 134 is configured to bypass the inversescale module 421 and the inverse transform module 422 (not illustratedin FIG. 4), resulting in the residual data array 450 being inputdirectly to a summation module 424.

The motion compensation module 434 uses the motion vectors 452 from theentropy decoder module 420, combined with reference frame data 460 fromthe a frame buffer block 432, configured within the memory 206, toproduce an inter-predicted prediction unit (PU) 462 for a predictionunit (PU), being a prediction of output decoded frame data. When theprediction mode 454 indicates that the current prediction unit was codedusing intra-prediction, the intra-frame prediction module 426 producesan intra-predicted prediction unit (PU) 464 for the prediction unit (PU)using samples spatially neighbouring the prediction unit (PU) and aprediction direction also supplied by the prediction mode 454. Thespatially neighbouring samples are obtained from a sum 458, output fromthe summation module 424. The multiplexer module 428 selects theintra-predicted prediction unit (PU) 464 or the inter-predictedprediction unit (PU) 462 for a prediction unit (PU) 466, depending onthe current prediction mode 454. The prediction unit (PU) 466, which isoutput from the multiplexer module 428, is added to the residual samplearray 456 from the inverse scale and transform module 422 by thesummation module 424 to produce the sum 458 which is then input to eachof a deblocking filter module 430 and the intra-frame prediction module426. The deblocking filter module 430 performs filtering along datablock boundaries, such as transform unit (TU) boundaries, to smoothvisible artefacts. The output of the deblocking filter module 430 iswritten to the frame buffer module 432 configured within the memory 206.The frame buffer module 432 provides sufficient storage to hold one ormore decoded frames for future reference. Decoded frames 412 are alsooutput from the frame buffer module 432 to a display device, such as thedisplay device 136.

FIGS. 5A and 5B each show sample grids of a frame portion 500 and aframe portion 510 encoded using a 4:2:0 and a 4:2:2 chroma formatrespectively. The chroma format is specified as a configurationparameter to the video encoder 114 and the video encoder 114 encodes a‘chroma_format_idc’ syntax element into the encoded bitstream 312 thatspecifies the chroma format. The video decoder 134 decodes the‘chroma_format_idc’ syntax element from the encoded bitstream 312 todetermine the chroma format in use. For example, when a 4:2:0 chromaformat is in use, the value of chroma_format_idc is one, when a 4:2:2chroma format is in use, the value of chroma_format_idc is two and whena 4:4:4 chroma format is in use, the value of chroma_format_idc isthree. In FIGS. 5A and 5B, luma sample locations, such as a luma samplelocation 501, are illustrated using ‘X’ symbols, and chroma samplelocations, such as a chroma sample location 502, are illustrated using‘O’ symbols. By sampling the frame portion 500 at the points indicated,a sample grid is obtained for each colour channel when a 4:2:0 chromaformat is applied. At each luma sample location X, the luma channel(‘Y’) is sampled, and at each chroma sample location O, both the chromachannels (‘U’ and ‘V’) are sampled. As shown in FIG. 5A, for each chromasample location, a 2×2 arrangement of luma sample locations exists. Bysampling the luma samples at the luma sample locations and chromasamples at the chroma sample locations indicated in the frame portion510, a sample grid is obtained for each colour channel when a 4:2:2chroma format is applied. The same allocation of samples to colourchannels is made for the frame portion 510 as for the frame portion 500.In contrast to the frame portion 500, twice as many chroma samplelocations exist in frame portion 510. In frame portion 510 the chromasample locations are collocated with every second luma sample location.Accordingly, in FIG. 5B, for each chroma sample location, an arrangementof 2×1 luma sample locations exists.

Various allowable dimensions of transform units were described above inunits of luma samples. The region covered by a transform applied for theluma channel will thus have the same dimensions as the transform unitdimensions. As the transform units also encode chroma channels, theapplied transform for each chroma channel will have dimensions adaptedaccording to the particular chroma format in use. For example, when a4:2:0 chroma format is in use, a 16×16 transform unit (TU) will use a16×16 transform for the luma channel, and an 8×8 transform for eachchroma channel. One special case is that when a 4×4 transform is usedfor the luma channel there is no corresponding 2×2 transform available(when the 4:2:0 chroma format is applied) or 4×2 transform available(when the 4:2:2 chroma format is applied) that could be used for thechroma channels. In this special case, a 4×4 transform for each chromachannel may cover the region occupied by multiple luma transforms.

FIG. 6A is a schematic representation of an exemplary transform tree ofa coding unit (CU) 602 (depicted with a thick border), within a codingtree block (CTB) 600 of the frame. A single quad-tree subdivisiondivides the coding tree block (CTB) 600 into four 32×32 coding units(CUs), such as the coding unit (CU) 602. An exemplary transform treeexists within the coding unit (CU) 602. The exemplary transform treeincludes several quad-tree subdivisions, resulting in ten transformunits (TUs) numbered as such in FIG. 6A, for example the transform unit#9 (TU) 604. The transform units #1-#10 cover the entirety of the codingunit (CU) 602. Each quad-tree subdivision divides a region spatiallyinto four quadrants, resulting in four smaller regions. Each transformunit (TU) has a transform depth value, corresponding to a hierarchicallevel of the transform unit (TU) within the transform tree. Thehierarchical level indicates the number of quad-tree subdivisionsperformed before the quad-tree subdivision terminated, resulting in aninstance of a transform unit (TU) that occupies the correspondingregion. For example, the transform unit #9 (TU) 604, occupies onequarter of the area of the coding unit (CU) 602 and therefore hastransform depth of one. Each transform unit (TU) has an associated size(or ‘transform size’), generally described as the dimensions of theregion containing the transform unit (TU) on the luma sample grid. Thesize is dependent on the coding unit (CU) size and the transform depth.Transform units (TUs) with a transform depth of zero have a size equalto the size of the corresponding coding unit (CU). Each increment of thetransform depth results in a halving of the size of transform units(TUs) present in the transform tree at the given transform depth. As theframe includes a luma channel and chroma channels, the coding unit (CU)602 occupies a region on both the luma sample grid and the chroma samplegrid and thus each transform unit (TU) includes information describingboth the luma samples on the luma sample grid and the chroma samples onthe chroma sample grid. The nature of the information for each transformunit (TU) is dependent on the processing stage of the video encoder 114or the video decoder 134. At the input to the transform module 320 andthe output of the inverse scale and transform module 422, the residualsample array 360 and 456 respectively contain information for eachtransform unit (TU) in the spatial domain. The residual sample array 360and 456 may be further divided into a ‘chroma residual sample array’ anda ‘luma residual sample array’, due to differences in processing betweenthe luma channel and the chroma channels. At the output from the scaleand quantise module 322 and the input of the inverse scale and transformmodule 422, the residual data array 364 and 450 respectively containinformation for each transform unit (TU) in the frequency domain. Theresidual data arrays 364 and 450 may be further divided into a ‘chromaresidual data array’ and a ‘luma residual data array’, due todifferences in processing between the luma channel and the chromachannels.

FIG. 6B illustrates an exemplary transform tree 630, corresponding tothe exemplary transform tree of FIG. 6A, for the luma channel of a 32×32coding unit (CU), containing a set of transform units (TUs) andoccupying the coding unit (CU) 602, which occupies a 32×32 luma samplearray on the luma sample grid. FIG. 7 illustrates a data structure 700that represents the exemplary transform tree 630. In FIG. 6B, boxesnumbered 1 to 10 indicate transform units present within region 632(exemplified by several transform units (TUs) 640), and each box iscontained in a region that is not further sub-divided (indicated by abox with dashed border).

In FIG. 6B, boxes numbered 1 and 9 contain 16×16 transforms for the lumachannel, boxes numbered 2, 3 and 8 contain 8×8 transforms for the lumachannel and boxes numbered 4 to 7 contain 4×4 transforms for the lumachannel. The corresponding region (dashed box) for each of these boxeshas coded block flag value of one, to indicate the presence of atransform.

The presence or absence of a transform for each colour channel isspecified by a separate coded block flag value which is used in each ofencoding and decoding of the bitstream, but which need not betransmitted in the bitstream, as will be discussed below. Consequently,the number of residual coefficient arrays 450 output from the entropydecoder 420 is dependent on the coded block flag values. When nosignificant coefficients are present (i.e. all coefficients are zero) inany colour channel, the number of residual data (coefficient) arrays 450output from the entropy decoder 420 is zero.

In FIG. 7, the circles represent split transform flag values with thesplit transform flag value being indicated inside the correspondingcircle. In FIG. 7, the triangles represent coded block flag values, withthe coded block flag value being indicated inside the correspondingtriangle. The squares represent transform units, with each transformnumbered to accord with the transform numbering present in FIG. 6B.

The uppermost hierarchical level of the exemplary transform tree 630contains a region 632 occupying a 32×32 coding unit (CU). A splittransform flag value 702 indicates that the region 632 is sub-dividedinto four 16×16 regions, such as a region 634, thus defining a‘non-leaf’ node of the exemplary transform tree 630. For each 16×16region, a further split transform flag value, such as a split transformflag value 704, indicates that the respective 16×16 region should befurther sub-divided into four 8×8 regions. For example, the region 634is not further sub-divided, as indicated by the split transform flagvalue 704 of zero, thus defining a ‘leaf’ node of the exemplarytransform tree 630. In contrast, a region 638 is further sub-dividedinto four 4×4 regions (such as a region 636), as indicated by a splittransform flag value 712 of one. The recursive split structure presentin the transform tree 630 is analogous to the quad-tree split present inthe coding tree block (CTB). For the luma channel, at the ‘leaf’ nodesof the quad-tree, the presence of a transform in the transform unit (TU)is signalled by a coded block flag value, for example a coded block flagvalue 708 of one indicates the presence of a transform 710 in the region634.

As a transform may be used to represent residual data in each region,regions are not permitted to be smaller than the smallest supportedtransform size, such as 4×4 luma samples for the luma channel.Additionally, for regions larger than the largest available transformsize, a split transform flag value of one is inferred. For example, fora transform tree with a top level of a 64×64 coding unit, an automaticsub-division (i.e.: not signalled in the encoded bitstream 312) intofour 32×32 regions occurs when the largest supported transform size is32×32 luma samples.

A lower right 16×16 region 642 contains a transform unit (TU) (numbered10 (ten) and shaded) with no transform for the luma channel andtherefore has a corresponding coded block flag value 716 of zero.

FIGS. 6C and 8 illustrate the exemplary transform tree 630,corresponding to the exemplary transform tree of FIG. 6A, for a chromachannel, configured for the 4:2:2 chroma format and containing a set oftransforms for a chroma channel corresponding to the transform tree 630for the luma channel and represented by a data structure 800. As thetransform tree hierarchy is common by virtue of the structure of FIG. 6Abetween the luma channel and the chroma channels, the split transformflag values are shared between the data structures 700 and 800. Incontrast to the data structure 700, the data structure 800 includes acoded block flag value with each transform split flag value of one (i.e.on non-leaf nodes of the transform tree). For example, a coded blockflag value 802 of one is associated with the transform split flag 702.If the coded block flag value on a non-leaf node of the transform treeis zero, coded block flag values on the child nodes are inferred as zero(and no corresponding coded block flags are encoded in the encodedbitstream 312). Coded block flag values at non-leaf regions enableterminating the encoding of coded block flags at lower levels of thetransform tree for each chroma channel if no significant residualcoefficients are present in any of the child regions, even thoughsignificant residual coefficients may be present in the luma channel.This is a common situation for typical captured frame data, as themajority of information is present in the luma channel.

When the video encoder 114 and the video decoder 134 are configured fora 4:4:4 chroma format, the chroma region of each chroma channel of anygiven transform unit (TU) of a size that is not one of the predeterminedset of transform unit (TU) sizes has identical dimensions to the lumaregions of the given transform unit (TU) (i.e.: when an inferred splitdoes not take place). When the video encoder 114 and the video decoder134 are configured for a 4:4:4 chroma format, the chroma region of eachchroma channel of any given transform unit (TU) of a size that is one ofthe predetermined set of transform unit (TU) sizes has dimensionssmaller than to the luma regions of the given transform unit (TU) (i.e.:when an inferred split does take place).

When a 4:2:2 chroma format is in use, this results in the coding unit(CU) 602 including a 16×32 region 662 of FIG. 6C of chroma samples foreach chroma channel and thus occupying a 16×32 region on the chromasample grid. FIG. 6C illustrates the regions on a chroma sample grid,drawn as an array of chroma samples, with each chroma sample equallyspaced horizontally and vertically (in contrast to FIG. 5B). Due to theuse of the 4:2:2 chroma format, each chroma regions of FIG. 6C appearshorizontally compressed with respect to the corresponding luma region ofFIG. 6B. The split transform flag value 702 of one divides the 16×32region 662, corresponding to the coding unit (CU) 602, into four 8×16regions, such as an 8×16 region 664. The 8×16 region 664 has anon-square shape and is also larger in size than other non-squareregions illustrated in FIG. 6C, such as a 4×8 region 670. For each 8×16region, a split transform flag value, such as the split transform flagvalue 704, indicates whether the corresponding 8×16 region should befurther sub-divided into four smaller 4×8 regions, in an analogousmanner to the quad-tree splitting present in the transform tree 630 forthe luma sample array. An upper right 8×16 region 672 is furthersub-divided into four 4×8 regions. A coded block flag value 804 of oneindicates that each of the four 4×8 regions could contain significantresidual coefficients. A coded block flag for each 4×8 region is thusrequired to indicate the presence of a transform for the correspondingregion. Of these four 4×8 regions, a lower left 4×8 region 674 (shaded)contains a transform unit (TU) but does not contain a transform andtherefore has a coded block flag value 814 of zero. The remaining 4×8regions, such as the region 670, each have a transform and thereforehave corresponding coded block flag values of one. The upper left 8×16region is sub-divided into two equal-sizes 8×8 regions. In contrast tothe quad-tree subdivision, no corresponding split transform flag ispresent in the encoded bitstream 312.

Splitting a region of a channel, such as a chroma channel, of atransform unit (TU) into multiple regions (each of which may have atransform), without signalling being present in the encoded bitstream312, is referred to as an ‘inferred split’. The inferred spliteliminates the need to introduce hardware supporting a non-squaretransform for this case (8×16). Instead, transforms, such as a first 8×8transform 666, are used. As it is possible for each of the regionsresulting from the inferred split to contain all zero residualinformation, it is necessary to specify the presence of a transform ineach region resulting from the inferred split. Accordingly, separatecoded block flag values are required for each region resulting from aninferred split. In this case, coded block flag values 806 and 808correspond to the first 8×8 transform 666 and a second 8×8 transform 668respectively. For transform units (TUs) where no inferred split takesplace, a coded block flag value for each chroma channel specifies thepresence or absence of a transform for the region occupied by thetransform unit (TU) for the chroma channel. When an inferred split takesplace, a separate coded block flag value (not illustrated in FIG. 8) isrequired for each of the resulting regions, however implementations mayretain a coded block flag value attributable to the entire transformunit (TU). The separate coded block flag value could be inferred as‘one’ in all cases, or the separate coded block flag value could bedetermined by performing a logical ‘OR’ operation to the coded blockflag value of each region resulting from the split. If the separatecoded block flag value is determined from the coded block flag value ofeach region resulting from the split, the separate coded block flagvalue may be encoded in the encoded bitstream 312 by the entropy encoder324 and decoded from the encoded bitstream 312 by the entropy decoder420 as an additional coded block flag (not illustrated in FIG. 9). Insuch a case, when the separate coded block flag value is zero, the codedblock flag value of each region from the split may be inferred to bezero and when the separate coded block flag value is one, the codedblock flags for each region from the split are encoded in the encodedbitstream 312 by the entropy encoder 324 and decoded from the encodedbitstream 312 by the entropy decoder 420.

The lower left 8×16 region 680 of the 16×32 region 662 illustrates aninferred split where an 8×8 transform is present in the upper 8×8inferred region 682 but no 8×8 transform is present in the lower 8×8inferred region 684. A lower right 8×16 array 676 (shaded) contains atransform unit (TU) but does not contain a transform in either square8×8 region resulting from the inferred split and therefore has codedblock flag values 810 812 of zero.

The presence of two chroma channels results in a duplication of thestructure depicted in FIG. 6C, with separate coded block flag valuesused to specify the presence of transforms for each chroma channel. Inthis implementation, a split was inferred for region sizes for chromaother than the size 4×8, resulting in using a 4×8 rectangular transform,such as a 4×8 transform 816 (contained in region 670), and enablingreuse of existing square transforms in other cases (e.g. 8×8, 16×16).Thus, a set of predetermined region sizes (such as 8×16 and 16×32) maybe said to exist, for which a split into two regions, and hence twotransforms (of sizes 8×8 and 16×16), can be used. Different definitionsof the predetermined set of region sizes for which an inferred splitoccurs are also possible and will allow a different combination ofexisting square transforms and rectangular transforms to be used. It isalso possible for certain implementations to always infer a split, inwhich case no rectangular transform is introduced for the chroma 4:2:2colour channels. In such a case, the predetermined set of region sizesfor which an inferred split occurs contains all possible chroma regionsizes (e.g. 4×8, 8×16 and 16×32 for a 4:2:2 chroma format, or 4×4, 8×8,16×16 and 32×32 for a 4:4:4 chroma format).

FIG. 16 is a schematic representation showing an example of ‘norectangular transform’ for an implementation of an ‘always’ inferredsplit for all possible chroma region sizes (4×8, 8×16, and 16×32) forthe 4:2:2 chroma formats. As illustrated in FIG. 16 with labelling of‘1’ (one) and ‘2’ (two) for each chroma region resulting from theinferred split.

When a 4:2:0 chroma format is in use, an inferred split does not takeplace for either chroma region in the transform unit (TU), therefore themaximum number of transforms for each chroma channel is always one (thecoded block flag value for each chroma channel controls whether thechroma transform occurs).

Although the video encoder 114 and the video decoder 134 are describedindependently of differences between the luma and chroma channels, thediffering sample grids resulting from the chroma formats necessitatesthe need for differences in the modules. Practical implementations mayhave a separate ‘processing paths’ for the luma channel and for thechroma channels. Such an implementation may thus decouple processing ofluma samples and chroma samples. As the encoded bitstream 312 is asingle bitstream for both the luma and chroma channels, the entropyencoder 324 and the entropy decoder 420 are not decoupled. Additionally,a single frame buffer, such as the frame buffer 332 432 holds luma andchroma samples and is thus not decoupled. However, the modules 322-330and 334-340 and the modules 422-430 and 434 may have luma and chromaprocessing decoupled, enabling implementations to have separate logicfor luma and chroma, thus creating a ‘luma processing path’ and a‘chroma processing path’.

Certain implementations may infer a split for the 16×32 region of achroma channel of a transform unit (TU) into two 16×16 regions, but notinfer a split for the 8×16 and 4×8 cases. Such implementations avoid theneed to introduce 32-point transform logic into the chroma processingpath, instead being able to rely on 4, 8 or 16-point transform logicwell-established in the art.

FIGS. 9A and 9B illustrate a syntax structure that can be used to encodeor otherwise represent a hierarchical level of the transform tree. Atnon-leaf nodes of a transform tree, a syntax structure 900 is expandedrecursively in accordance with data structures, such as the datastructures 700 and 800, to define the syntax elements present in aportion of the encoded bitstream 312 corresponding to the transformtree. At leaf nodes of a transform tree (where no further sub-divisiontakes place in the transform tree) a syntax structure 930 defines syntaxelements present in the portion of the encoded bitstream 312. Typically,one data structure for luma and two data structures for chroma arepresent, although additional data structures are possible, such as forencoding an alpha channel or a depth map. Alternatively, fewer datastructures may be utilised, such as in the case where a single datastructure is shared by the chroma channels and coded block flag valuesare able to be shared between the chroma channels. A transform treenon-leaf node syntax structure 902 defines the encoding of onehierarchical level of a transform tree, such as the transform tree 630.A split transform flag 910 encodes a split transform flag value of one,such as the split transform flag value 702. This value indicates thatthe transform tree non-leaf node syntax structure 902 includes a lowerhierarchical level that contains additional instances of the transformtree non-leaf node syntax structure 902 or transform tree leaf-nodesyntax structure 932, or ‘child nodes’. A coded block flag 912 encodesthe coded block flag value 802 of one for the ‘U’ chroma channel and acoded block flag 914 encodes a further coded block flag value for the‘V’ chroma channel. If the transform tree non-leaf node syntax structure902 is defining the top level of the transform tree hierarchy then thecoded block flags 912 914 are present. If the transform tree non-leafnode syntax structure 902 is not defining the top level of the transformtree hierarchy then the coded block flags 912 914 are only present ifthe corresponding coded block flags in the parent level of the transformtree hierarchy are present and one-valued. As a lower hierarchical levelexists in the transform tree 630 (relative to the top hierarchicallevel), a quad-tree sub-division takes place. This sub-division resultsin four transform tree syntax structures 916, 918, 920, 922 (identifiedby a variable ‘blkIdx’ (block-index) numbered from zero to three) beingincluded in the transform tree non-leaf node syntax structure 902.

The syntax structure 930 defines the encoding of the leaf node of thetransform tree leaf node 932 (i.e. where no further sub-division takesplace). A split transform flag 940 encodes a split transform flag valueof zero, such as the split transform flag value 704.

A split transform flag is only encoded if the corresponding region islarger than a minimum size. For example, the region 636 has the smallestallowable size for a region of 4×4 luma samples (corresponding to thesmallest supported luma transform size) so a transform split flag value714 is inferred as zero and no split transform flag is encoded for thecorresponding transform tree syntax structure.

For the region 636, chroma residual samples are transformed using a 4×8chroma transform, hence no inferred transform split is present. Codedblock flags, such as a coded block flag 942 and a coded block flag 946may be present to signal the presence of a transform for each of thechroma channels. A coded block flag 950 signals the presence of atransform for the luma channel. Residual coefficients for the luma andchroma channels (if present) are present in a transform unit (TU) syntaxstructure 952. If the value of the coded block flag 950 is one, a lumatransform skip flag 964 and a luma residual data block 954, encodingeither residual coefficients for a luma transform or residual sampleswhen the transform is skipped, are present in the encoded bitstream 312.The value of the luma transform skip flag 964 indicates whether thetransform module 320 in the video encoder 114 and the inverse transformmodule 422 in the video decoder 134 is used (in normal operation) orbypassed (in transform skip operation). If the value of the coded blockflag for each chroma channel is one, corresponding chroma transform skipflags 966 and 968 and chroma residual blocks 956 and 960 are present inthe encoded bitstream 312. The transform skip flag 966 signals thetransform skip mode for chroma residual block 956, and the transformskip flag 968 signals the transform skip mode for the chroma residualblock 960. When no inferred transform split occurs, a coded block flag944 and 948 and chroma residual blocks 958 and 962 are absent from theencoded bitstream 312. When no inferred transform split occurs, thetransform skip flag for each chroma channel thus signals the transformskip mode for the corresponding chroma channel in the entirety of theregion 636.

For the region 664, chroma residual samples are transformed using two8×8 chroma transforms, hence an inferred transform split is present. Thecoded block flags 942 and 946, if present, signal the presence of 8×8transforms for each chroma channel of the first 8×8 transform 666. Thecoded block flag 944 and the coded block flag 948, if present, signalthe presence of 8×8 transforms for each chroma channel of the second 8×8transform 668. If the value of the coded block flag 944 is one, thechroma residual block 958 is present in the encoded bitstream 312. Ifthe value of the coded block flag 948 is one, the chroma residual block962 is present in the encoded bitstream 312. The transform skip flag 966signals the transform skip mode for the chroma residual blocks 956 and958 and the transform skip flag 968 signals the transform skip mode forthe chroma residual blocks 960 and 962. When an inferred transform splitis present, the transform skip flag for each chroma channel is thussignalling the transform skip mode for the corresponding chroma channelin the entirety of the region 664, in accordance with the behaviour whenno inferred transform split is present.

The syntax structure 930 as illustrated in FIG. 9B, shows the first andsecond transform of each chroma channel encoded adjacently for theinferred transform split. Other arrangements, such as encoding syntaxelements for each chroma channel adjacently, or encoding syntax elementsfor each chroma channel interspersed with other syntax elements, mayalternatively be used.

FIGS. 9C, 9D and 9E illustrate an alternative syntax structure 9100 thatcan be used to encode or otherwise represent a hierarchical level of thetransform tree. At non-leaf nodes of a transform tree, the alternativesyntax structure 9100 is expanded recursively in accordance with datastructures, such as the data structures 700 and 800, to define thesyntax elements present in a portion of the encoded bitstream 312corresponding to the transform tree. An instance of the alternativesyntax structure 9100 exists for each node in the transform tree,including the leaf nodes, which each contain a transform unit (TU).Where an ‘inferred split’ occurs to sub-divide the transform unit (TU)for each chroma channel, a syntax structure 9130 defines syntax elementspresent in the portion of the encoded bitstream 312 for the firstsub-region resulting from the inferred split (e.g. the top half of achroma region when a 4:2:2 chroma format is in use or the top-leftquarter of a chroma region when a 4:4:4 chroma format is in use).Furthermore, a syntax structure 9160 defines syntax elements present inthe portion of the encoded bitstream 312 for subsequent sub-regionsresulting from the inferred split (e.g. one more sub-region for thelower half of a chroma region when a 4:2:2 chroma format is in use orthe remaining three sub-regions of a chroma region when a 4:4:4 chromaformat is in use). The notion of a ‘first’ sub-region and a ‘subsequent’sub-region (e.g. a second and possibly a third or fourth sub-region) isimplicit in the scanning order of the sub-regions of a region within aquad-tree. The scanning order is such that the sub-regions are traversedfirstly from left to right and secondly from top to bottom. Typically,one data structure for luma and two data structures for chroma arepresent, although additional data structures are possible, such as forencoding an alpha channel or a depth map. Alternatively, fewer datastructures may be utilised, such as in the case where a single datastructure is shared by the chroma channels and coded block flag valuesare able to be shared between the chroma channels. A transform treesyntax structure 9102 defines the encoding of one hierarchical level ofa transform tree, such as the transform tree 630.

For an instance of the transform tree syntax structure 9102 at anon-leaf node of a transform tree, such as the transform tree 630, asplit transform flag 9110 encodes a split transform flag value of one,such as the split transform flag value 702. This value indicates thatthe instance of the transform tree syntax structure 9102 includes alower hierarchical level, containing additional instances of thetransform tree syntax structure 9102 or ‘child nodes’. A coded blockflag 9112 encodes a coded block flag value in accordance with thedescription of the coded block flag 912. A coded block flag 9114 encodesa coded block flag value in accordance with the description of the codedblock flag 914. As a lower hierarchical level exists in the transformtree 630 (relative to the top hierarchical level), a quad-treesub-division takes place. This sub-division results in four transformtree syntax structures 9116, 9118, 9120, 9122 (identified by a ‘blkIdx’variable numbered from zero to three) being included in the transformtree node syntax structure 9102. Each of the transform tree syntaxstructures 9116, 9118, 9120, 9122 is another instance of the transformtree syntax structure 9102. A coded block flag 9124 and a luma transformunit portion 9126, encoding either residual coefficients for a lumatransform or residual samples when the transform is skipped, will beabsent from the transform tree syntax structure 9102.

Implementations may also arrange the transform tree syntax structure9102 such that the coded block flag 9124 and the luma transform unitportion 9126 (if present) are placed earlier in the transform treesyntax structure 9102, such as in between the coded block flag 9114 andthe transform tree syntax structure 9116.

For an instance of the transform tree syntax structure 9102 at a leafnode of a transform tree, such as the transform tree 630, a splittransform flag 9110 encodes a split transform flag value of zero, suchas the split transform flag value 704. The instance of the transformtree syntax structure 9102 thus corresponds to a transform unit (TU) inthe transform tree 930. The transform unit (TU) has a size determined inaccordance with the coding unit (CU) containing the transform unit (TU),such as the coding unit (CU) 602, and the transform depth. The codedblock flag 9112 encodes a coded block flag value of one to indicate thatany of the chroma regions resulting from the inferred split for the ‘U’chroma channel may have a coded block flag value of one. If the codedblock flag 9112 encodes a value of zero, then the coded block flag valuefor each chroma region resulting from the inferred split for the ‘U’chroma channel have a coded block flag value inferred as zero. Even whenthe code block flag 9112 encodes a value of one, implementations maystill encode a coded block flag having a value of zero for each chromaregion resulting from the inferred split. Therefore, implementations mayomit the coded block flag 9112 from the encoded bitstream 312, insteadalways inferred a coded block flag value of one for the omitted codedblock flag 9112. The coded block flag 9114 encodes a further coded blockflag value for the ‘V’ chroma channel in a similar manner to the codedblock flag 9112. For transform unit (TU) sizes that accord with thosefor which an inferred split into four chroma regions occurs (a maximumnumber of chroma residual coefficient arrays is four), the fourtransform tree syntax structures 9116 9118 9120 9122 (identified by‘blkIdx’ zero to three) are included in the transform tree node syntaxstructure 9102. For transform unit (TU) sizes that accord with those forwhich an inferred split into two chroma regions occurs (a maximum numberof chroma residual coefficient arrays is two), two transform tree syntaxstructures, such as transform tree syntax structures 9116 9118(identified by ‘blkIdx’ zero and one) are included in the transform treenode syntax structure 9102. Each of the transform tree syntax structures9116 9118 9120 9122 is an instance of a transform tree for chroma syntaxstructure 9132. The coded block flag 9124 encodes a coded block flagvalue, such as the coded block flag value 708, specifying the presenceof absence of a transform for the luma channel of the transform unit(TU). The luma portion of the transform unit 9126 encodes a lumatransform skip flag as transform skip flag 9127 and a luma residualcoefficient array as luma residual syntax elements 9128.

The transform tree for chroma syntax structure 9132, only existing forthe first chroma region (or ‘sub-region’) when an inferred split takesplace, includes a reduced set of the syntax of the transform tree syntaxstructure 930. A coded block flag 9142 encodes a coded block flag valuefor the ‘U’ chroma channel of the chroma region. A coded block flag 9144encodes a coded block flag value for the ‘V’ chroma channel of thechroma region. A chroma portion of the transform unit (TU) 9146, encodesa subset of the transform unit (TU) syntax structure 952. The chromaportion of the transform unit (TU) 9146 encodes chroma transformscontaining chroma data for a single colour channel. The chromatransforms are encoded in the form of a chroma residual coefficientarray as chroma residual syntax elements 9150 for the ‘U’ chroma channelif the value of the coded block flag 9142 is one, and a chroma residualcoefficient array as chroma residual syntax elements 9152 for the ‘V’chroma channel if the value of the coded block flag 9144 is one(collectively, residual coefficient arrays for the ‘chroma transforms’).A transform skip flag 9148 is associated with the chroma residual syntaxelements 9150 and encodes a transform skip flag value for the ‘U’ chromachannel, for each chroma region resulting from the inferred split. Atransform skip flag 9151 is associated with the chroma residual syntaxelements 9152 and encodes a transform skip flag value for the ‘V’ chromachannel, for each chroma region resulting from the inferred split. Thisassociation is by way of the transform skip flag being encoded in a‘residual coding’ syntax structure that includes the correspondingresidual syntax elements.

The transform tree for chroma syntax structure 9162, only existing forchroma regions other than the first chroma region (or ‘sub-region’) whenan inferred split takes place, includes a reduced set of the syntax ofthe transform tree syntax structure 930. A coded block flag 9172 encodesa coded block flag value for the ‘U’ chroma channel of the chromaregion. A coded block flag 9174 encodes a coded block flag value for the‘V’ chroma channel of the chroma region. A chroma portion of thetransform unit (TU) 9176, encodes a subset of the transform unit (TU)syntax structure 952. The chroma portion of the transform unit (TU) 9176encodes a chroma residual coefficient array as chroma residual syntaxelements 9180 for the ‘U’ chroma channel if the value of the coded blockflag 9172 is one. The chroma portion of the transform unit (TU) 9176encodes a chroma residual coefficient array as chroma residual syntaxelements 9182 for the ‘V’ chroma channel if the value of the coded blockflag 9174 is one. The transform skip mode for the region correspondingto each chroma residual syntax elements 9180 is determined from thetransform skip flag 9148. The transform skip mode for the regioncorresponding to the region corresponding to each chroma residual syntaxelements 9182 is determined from the transform skip flag 9151.Implementations may make use of hardware registers, such as theregisters 246, or the memory 206 to store the transform skip flag fromthe first chroma region for use in the subsequent sub-region(s).

The syntax structures 9130 and 9160 as illustrated in FIGS. 9D and 9Eshow the first and second coded block flag encoded adjacently followedby the first and second chroma residual coefficient array of each chromachannel for the inferred transform split. Other arrangements, such asencoding the coded block flag and the chroma residual coefficient arrayadjacently for each chroma channel may alternatively be used.

Although the inferred transform split is illustrated with the 8×16region 664 split into two 8×8 regions, alternative implementations mayperform the split for other regions. For example, some implementationsmay infer a split of a 16×32 region into two 16×16 regions. Suchimplementations advantageously avoid the need for a 32-point 1Dtransform in the chroma processing path. Since the 32-point 1D transformis not required for the chroma processing path when the 4:2:0 chromaformat is applied, the requirement for the 32-point 1D transform isentirely removed from the chroma processing path. Implementations thatuse separate processing circuitry to decouple the luma and chromachannels may thus achieve a lower implementation cost in the chromaprocessing circuitry.

A 4:4:4 chroma format exists where there is one chroma sample locationfor each luma sample location. Accordingly, with this format, transformsfor the chroma channel and the luma channel may have the same sizes.With a largest transform size of 32×32 in the luma processing path, thiswould require introducing a 32×32 transform into the chroma processingpath for a decoupled implementation. Specific implementations may infera split for each chroma channel to split a 32×32 region into four 16×16regions, enabling reuse of the existing 16×16 transform in the chromaprocessing path. Since a 32×32 transform would only be used in thechroma processing path for the 4:4:4 chroma format, inferring a splitfor each chroma channel to split a 32×32 region into four 16×16 regionswould enable the 32×32 transform to be removed from the chromaprocessing path, reducing the processing circuitry required. Suchimplementations would require four coded block flag values for eachchroma channel, and thus up to four coded block flags coded in thesyntax structure 930 for each chroma channel in the encoded bitstream312.

Implementations supporting a 4:2:2 chroma format may also infer a splitfor each chroma channel to split a 32×16 region into four 8×16 regions.Such implementations require four coded block flag values for eachchroma channel, and thus four coded block flags coded in the syntaxstructure 930 for each chroma channel in the encoded bitstream 312, thusa ‘CU3’, ‘CU4’, ‘CV3’ and ‘CV4’ coded block flag (not illustrated inFIG. 9B) may be introduced in the transform unit (TU) syntax structure952. Such implementations avoid introducing 32-point transform logicinto the chroma processing path and, where 8×16 regions are notsub-divided, may reuse 8×16 transform logic required for transform units(TUs) of size 16×16 (in the luma channel) that require transformingtransform of size 8×16 for the chroma channels.

FIG. 10 is a schematic flow diagram showing a method 1000 for encoding atransform unit (TU) by encoding the transform tree non-leaf node syntaxstructure 902 and the transform tree leaf node syntax structure 932. Themethod 1000 is described with reference to a chroma channel of thetransform unit (TU) however the method 1000 may be applied to any chromachannel of the transform unit (TU). As the transform tree non-leaf nodesyntax structure 902 and the transform tree leaf node syntax structure932 describe one node in the transform tree, the method 1000 encodes onenode of the transform tree into the encoded bitstream 312. The method1000 may be implemented in hardware or by software executable on theprocessor 205, for example. The method 1000 is initially invoked for thetop level of the transform tree and is capable of invoking itself(recursively) to encode child nodes of the transform tree. A determinetransform unit size step 1002 determines the size of a transform unit(TU) in a transform tree according to the coding unit (CU) size thatcontains the transform tree and a transform depth value of the transformunit (TU). When the method 1000 is invoked at the top level of thetransform tree, the transform depth value is set to zero, otherwise thetransform depth value is provided by the parent instance of the method1000. A split transform flag value, such as the split transform flagvalue 702 is encoded in the encoded bitstream 312 as split transformflag 910 if the transform depth value is less than the maximum allowedtransform depth.

When the split transform flag value is one, chroma coded block flags 912and 914 are encoded for each chroma channel only if the parent node ofthe transform tree hierarchy has a corresponding coded block flag valueof one. The method 1000 then invokes a new instance of the method 1000for each child node (represented in the portion of the encoded bitstream312 by transform tree syntax structures 916, 918, 920 and 922) of thetransform tree. Each instance of the method 1000, invoked for the childnodes, is provided with a transform depth value equal to the presentmethod 1000 instance transform depth value incremented by one.

When the split transform flag value is zero, an identify maximum numberof forward transforms step 1004 determines a maximum number (n) oftransforms for each chroma channel of the region being encoded. When noinferred split takes place, this number n will be one. When a 4:2:2chroma format is in use and a rectangular region of a chroma channel,such as the 8×16 region 664, is encountered and the region size is oneof a predetermined set of region sizes (such as 16×32 and 8×16), aninferred split takes place and the maximum number of transforms will betwo (otherwise the number of transforms will be one). Otherwise (theregion size is not one of a predetermined set of region sizes) themaximum number of transforms will be one. For example, if 4×8 is not oneof the predetermined set of region sizes, then the maximum number oftransforms will be one. When a 4:4:4 chroma format is in use and theencountered region size is one of a predetermined set of region sizes(such as a 32×32 region), an inferred split takes place and the maximumnumber of transforms will be four. Otherwise (the region size is not oneof a predetermined set of region sizes) the maximum number will be one.For example, if 8×8 is not one of the predetermined set of region sizes,then the maximum number of transforms will be one. Although thepredetermined set of region sizes includes 8×16, other predetermined setof region sizes are possible, such as only 16×32 when a 4:2:2 chromaformat is in use or 32×32 when a 4:4:4 chroma format is in use.

For each chroma channel, if the parent node had a coded block flag valueof one, then for each of n, a coded block flag is encoded in the encodedbitstream 312. For example, when the number of transforms is equal totwo, coded block flags 942 and 944 indicate the presence of a transformfor each of the two regions inferred by the split. A select forwardtransform step 1006 selects a forward transform from a predetermined setof forward transforms, for each of the maximum number of transforms,based on a transform unit (TU) size, which is in turn dependent on thetransform depth, and thus related to a hierarchical level of thetransform unit in the largest coding unit. When the transform depth isequal to zero, the transform unit (TU) size is equal to the coding unit(CU) size. For each increment of the transform depth, the transform unit(TU) size is halved. For a 32×32 coding unit (CU) size, a transformdepth of zero and using a 4:2:2 chroma format, the transform unit (TU)size will thus be 32×32 and the transform size for chroma will thus be16×32. For example, when the maximum number of transforms is two and theregion size for chroma is 16×32, then a 16×16 forward transform isselected for each of the 16×16 regions for chroma resulting from theinferred split.

An apply forward transform step 1008 performs the forward transform foreach of the maximum number of transforms on the corresponding regionthat has a coded block flag value of one. The encode chroma residualsample arrays step 1008 is generally performed by the transform module320. This results in a conversion of each chroma residual sample array(spatial domain representation) into a chroma residual coefficient array(frequency domain representation).

An encode chroma residual coefficient arrays step 1010 encodes thechroma residual coefficient array for each of the maximum number oftransform regions of each chroma channel having a coded block flag valueof one into the encoded bitstream 312. The number of chroma residualcoefficient arrays encoded for a given transform unit for a given chromachannel depends on the coded block flag value of each transform and willthus vary from zero to (at most) the maximum number of transforms. Forexample, when the number of transforms is two and both chroma channelshave coded block flag values of one for each of the count values, thenthe chroma residual blocks 956, 958, 960 and 962 are encoded in theencoded bitstream 312. If the coded block flag value for each transformfor a given chroma channel is zero, then no chroma residual block isencoded in the encoded bitstream 312 for that chroma channel. The encodechroma residual coefficient arrays step 1010 is generally performed bythe entropy encoder 324.

FIG. 11 is a schematic flow diagram showing a method 1100 for decoding atransform unit (TU) by decoding the transform tree non-leaf node syntaxstructure 902 and the transform tree leaf node syntax structure 932. Themethod 1100 is described with reference to a chroma channel of thetransform unit (TU) however the method 1100 may be applied to any chromachannel of the transform unit (TU). As the transform tree non-leaf nodesyntax structure 902 and the transform tree leaf node syntax structure932 describe one node in the transform tree, the method 1100 decodes onenode of the transform tree from the encoded bitstream 312. The method1100 may be performed in appropriate hardware or alternatively insoftware, for example executable by the processor 205. The method 1100is initially invoked for the top level of the transform tree and iscapable of invoking itself (recursively) to decode child nodes of thetransform tree. A determine transform unit (TU) size step 1102determines a transform unit (TU) size in a manner identical to thedetermine transform unit size step 1002. The determine transform unitsize step 1102 determines the size of a transform unit (TU) in atransform tree according to the coding unit (CU) size that contains thetransform tree and a transform depth value of the transform unit (TU).When the method 1100 is invoked at the top level of the transform tree,the transform depth value is set to zero, otherwise the transform depthvalue is provided by the parent instance of the method 1100. A splittransform flag value, such as the split transform flag value 702 isdecoded from the encoded bitstream 312 as split transform flag 910 ifthe transform depth value is less than the maximum allowed transformdepth.

When the split transform flag value is one, chroma coded block flags 912and 914 are decoded for each chroma channel only if the parent node ofthe transform tree hierarchy has a corresponding coded block flag valueof one. The method 1100 then invokes a new instance of the method 1100for each child node (represented in the portion of the encoded bitstream312 by transform tree syntax structures 916, 918, 920 and 922) of thetransform tree. Each instance of the method 1100, invoked for the childnodes, is provided with a transform depth value equal to the presentmethod 1100 instance transform depth value incremented by one.

When the split transform flag value is zero, an identify maximum numberof inverse transforms step 1104 determines a (maximum) number (n) oftransforms for each of the at least one chroma residual coefficientarrays present in each chroma channel of the region being decoded, in amanner identical to the identify maximum number (n) of forwardtransforms step 1004. When no inferred split takes place, this number nwill be one. When a 4:2:2 chroma format is in use and a rectangularregion of a chroma channel, such as the 8×16 region 664, is encounteredand the region size is one of a predetermined set of region sizes (suchas 16×32 and 8×16), an inferred split takes place and the maximum numberof transforms will be two (otherwise the number of transforms will beone). Otherwise (the region size is not one of a predetermined set ofregion sizes) the maximum number of transforms will be one. For example,if 4×8 is not one of the predetermined set of region sizes, then themaximum number of transforms will be one. When a 4:4:4 chroma format isin use and the encountered region size is one of a predetermined set ofregion sizes (such as a 32×32 region), an inferred split takes place andthe maximum number of transforms will be four. Otherwise (the regionsize is not one of a predetermined set of region sizes) the maximumnumber will be one. For example, if 8×8 is not one of the predeterminedset of region sizes, then the maximum number of transforms will be one.Although the predetermined set of region sizes includes 8×16, otherpredetermined set of region sizes are possible, such as only 16×32 whena 4:2:2 chroma format is in use or 32×32 when a 4:4:4 chroma format isin use. For each chroma channel, if the parent node had a coded blockflag value of one, then for each of the (n) transforms, a coded blockflag is decoded in the encoded bitstream 312. For example, when themaximum number of transforms is equal to two, coded block flags 942 and944 indicate the presence of a transform for each of the two regionsinferred by the split.

A decode chroma residual coefficient arrays step 1106 then decodes theresidual coefficient array for each of the maximum number of transformsregions of each chroma channel from the encoded bitstream 312 having acoded block flag value of one. The number of residual coefficient arraysdecoded for a given transform unit for a given chroma channel depends onthe coded block flag value of each transform and will thus vary fromzero to (at most) the ‘number (n) of transforms’. For example, when thenumber of transforms is two and both chroma channels have coded blockflags of one for each of the count values, then the chroma residualblocks 956, 958, 960 and 962 are decoded from the encoded bitstream 312.The decode chroma residual coefficient arrays step 1106 is generallyperformed by the entropy decoder 420 for each chroma residualcoefficient array having a coded block flag value of one.

A select inverse transform step 1108 then selects an inverse transformfrom a predetermined set of inverse transforms, for each of the maximumnumber of transforms having a coded block flag value of one for eachchroma channel. For example, when the maximum number of transforms istwo and the region size is 16×32 and the coded block flag value for eachof the two transforms is one, then a 16×16 inverse transform is selectedfor each of the 16×16 regions resulting from the inferred split.

An apply inverse transform step 1110 then performs the inverse transformfor each of the maximum number of transforms regions on thecorresponding region having a coded block flag value of one. Thisresults in a conversion of each chroma residual coefficient array(frequency domain representation) into a chroma residual sample array(spatial domain representation) representative of the decoded videoframe. The apply inverse transform step 1110 is generally performed bythe inverse scale and transform module 422.

FIG. 12A shows a diagonal scan pattern 1201, FIG. 12B shows a horizontalscan pattern 1202, and FIG. 12C shows a vertical scan pattern 1203, eachfor a 4×8 transform unit 1200. Those implementations that scan the 4×8transform unit 1200 using the illustrated scan patterns have theproperty that the residual coefficients are grouped in 4×4 blocks, knownas ‘sub-blocks’. A ‘coefficient group’ flag present in the encodedbitstream 312 may therefore be used to indicate, for each sub-block, thepresence of at least one significant (non-zero) residual coefficient.Applying a 4×4 sub-block size for the 4×8 transform achieves consistencywith the scan pattern present in other transform sizes, wherecoefficients are always grouped into sub-blocks.

Particular implementations may apply a coefficient group flag to signalthe presence of at least one non-zero residual coefficient in eachsub-block. Advantageously, these scan patterns permit re-use of controlsoftware or digital circuitry that processes residual coefficients, byreusing the sub-block processing for all transform sizes. The particularscan pattern used may be selected according to criteria such as theintra-prediction direction of the collocated prediction unit (PU). Wherea transform encodes chroma samples on a 4:2:2 chroma format sample grid,the relationship between the intra-prediction direction and the scanpattern is altered because each chroma sample maps to a non-square (2×1)array of luma samples, affecting the ‘direction’ or angle of theintra-prediction mode. Scanning is shown in a ‘backward’ direction inFIGS. 12A to 12C, ending at the DC coefficient, located in the top-leftcorner of the transform unit (TU). Further, scanning is not required tostart at the lower-right corner of the transform unit (TU). Due to thepredominance of nonzero residual coefficients in the upper left regionof the transform unit (TU), scanning may begin from a ‘last significantcoefficient position’ and progress in a backward direction until theupper left coefficient is reached.

Other implementations may apply a single scan to a given region toencode residual coefficients and then apply more than one transform tothese residual coefficients. In this case only one coded block flag isused for the region and therefore for all transforms covered by the scanpattern. The coded block flag is set to one if at least one significantresidual coefficient exists in any of the scans. For example, the 4×8scan patterns of FIGS. 12A-12C may be applied to encode residualcoefficients of two 4×4 transforms. The two 4×4 arrays of residualcoefficients may be concatenated to form a 4×8 array suitable for thescan pattern. As a single scan is performed over the array, a single‘last significant coefficient’ position is encoded in the bitstream forthe scan pattern and a single coded block flag value is sufficient forthe array. The energy compaction property of the modified discretecosine transform (DCT) gives advantage to other schemes, such asinterleaving the coefficients of each square transform along the path ofthe scan pattern into the rectangular coefficient array. This gives theadvantage the density of residual coefficient values in each 4×4residual coefficient array is approximately equalised in the combined4×8 array, allowing higher compression efficiency to be created by theentropy encoder 324, for subsequent decoding by the entropy decoder 420.

Certain implementations encoding chroma colour channels may use a firsttransform to encode residual samples at chroma sample locationscorresponding to a 4:2:0 chroma sample grid and a second transform toencode residual samples at the additional chroma sample locationsintroduced in the 4:2:2 chroma sample grid, relative to the 4:2:0 chromasample grid. Such implementations may advantageously use a simplifiedtransform for the second transform, such as a Hadamard transform withthe output of the second transform being added (or otherwise combined)to the residual samples for the first transform to produce the residualsamples for the second transform. Advantageously a preprocessing stageimplementing a transform such as a Haar transform may be used to samplethe chroma sample grid for a 4:2:2 chroma format into the chroma samplegrid for a 4:2:0 chroma format. Such configurations must transmitadditional residual coefficients from the preprocessing stage asside-information, such a residual applied to each largest coding unit(LCU) in the case that the preprocessing transform is applied at thelargest coding unit (LCU) level.

Implementations having multiple transforms for a given region may useeither a single combined scan covering the entire region, or a separatescan for each transform. If the scanning for the multiple transforms iscombined into a single scan, then only one coded block flag is requiredfor each region being scanned. Those implementations using a singlecombined scan may achieve higher compression of the residualcoefficients by interleaving the residual coefficients of eachtransform, such as interleaving on a coefficient-by-coefficient basis,in order to collocate residual coefficients from each transform havingsimilar spectral properties.

FIG. 13 is a schematic block diagram showing a method 1300 of encoding atransform unit. The method 1300, performed by the video encoder 114,encodes the luma channel and a chroma channel of the transform unit. Ina determine luma transform skip flag value step 1302, the transform skipcontrol module 346 determines the value of a transform skip flag, suchas the transform skip flag 964 or 9127, for the luma channel, typicallyby testing the cost of coding the residual sample array 360 in both thespatial domain (transform skip is performed) and in the frequency domain(transform skip is not performed). In a determine chroma transform skipflag value step 1304, the transform skip control module 346 determinesor otherwise sets the value of a transform skip flag, such as thetransform skip flag 966 or 9148, for one of the chroma channels to beapplied to all of the sub-regions resulting from an inferred split andbelonging to the same chroma channel. The transform skip control module346 may apply similar logic as for the luma channel, however thebit-rate cost determination must account for each of the chroma residualsample arrays resulting from the inferred split when determining thecost of either performing the transform skip for all chroma residualsample arrays in the chroma channel (or ‘colour channel’) or performingthe transform skip for none of the chroma residual sample arrays in thechroma channel. The determine chroma transform skip flag value step 1304is repeated for each chroma channel, determining transform skip flagvalues for other chroma channels, such as transform skip flags 968 or9151. The encode luma transform and chroma transform step 1306 encodesthe luma residual sample array in the encoded bitstream 312 using theentropy encoder 324 and encodes the chroma residual sample arrays for achroma channel in the encoded bitstream 312 using the entropy encoder324. The luma residual sample array is determined in accordance with theluma transform skip flag, either by transforming in the transform module320 the residual sample array into a residual coefficient array orbypassing the transform module 320 when a transform skip is performed bythe video encoder 114. Subsequently the residual array 363 is passed tothe scale and quantise module 322 to create the residual data array 364.When at least one of the values in the residual data array 364 isnon-zero, the values of the residual data array 364 are encoded into theencoded bitstream 312 by the entropy encoder 324 (in a block of residualdata, such as residual data block 954, 956, 958, 960 or 962) and thecorresponding coded block flag is set to one. The chroma residual samplearrays are determined similarly to the luma residual sample arrays,except that chroma residual sample arrays other than the first share thetransform skip flag with the first chroma residual sample array. Theencoding of chroma residual sample arrays in the step 1306 is repeatedfor each chroma channel.

FIG. 14 is a schematic flow diagram showing a method 1400 for decoding atransform unit. The method 1400, performed by the video decoder 134,decodes the luma channel and a chroma channel of the transform unit. Adetermine luma transform skip flag value step 1402 determines the valueof a transform skip flag for the luma channel by decoding a transformskip flag, such as the transform skip flag 964 or 9127, from the encodedbitstream 312 using the entropy decoder 420. A determine chromatransform skip flag value step 1404 determines the value of a transformskip flag for one of the chroma residual sample arrays within a chromachannel to be applied to all chroma residual sample arrays within thechroma channel and in the same transform unit (TU). The step 1404decodes a transform skip flag, such as the transform skip flag 966 or9148, from the encoded bitstream 312 using the entropy decoder 420.Implementations that associate the transform skip flag with the firstchroma residual sample array avoid the need to buffer earlier residualsample arrays before determining the transform skip flag from a laterresidual coefficient array (which would then be used to continueprocessing the earlier residual sample array, thus introducingadditional internal buffering). The step 1404 may also determine atransform skip flag for additional chroma channels, such as by decodingthe transform skip flag 968 or 9151 from the encoded bitstream 312 usingthe entropy decoder 420. A decode luma transform and chroma transformstep 1406 causes the entropy decoder 420 to decode a luma residualcoefficient array, such as the luma residual data block 954, when acorresponding coded block flag is one, such as the coded block flag 950,and the chroma residual coefficient arrays associated with a particularchroma channel, such as the chroma residual coefficient arrays 956 and958, when each corresponding coded block flag, such as the coded blockflags 942 and 944, are one. When decoding a luma transform, the lumaresidual coefficient array is only passed through the inverse transformmodule 422 if a transform skip is not performed, otherwise the lumaresidual coefficient array bypasses the inverse transform module 422.When decoding a chroma transform, for each chroma residual sample arrayin the transform unit, the transform skip flag present in the encodedbitstream 312 and associated with the first chroma residual sample arrayis applied.

The description of the methods 1300 and 1400 refer to a ‘transform unit’that may contain multiple chroma residual sample arrays for a givenchroma channel, when an inferred split takes place. This accords withthe syntax structure 930. When the syntax structures 9100, 9130 and 9160are in use, each chroma region resulting from an inferred split isillustrated as a separate transform unit (TU), marked as chromatransform units (CTUs) in FIGS. 9C, 9D and 9E. For the purposes of themethods 1300 and 1400, the chroma transform units (CTUs) are merely anartefact of using the transform tree syntax structure 9100 to split thechroma regions. In FIG. 9C, the spatial region occupied by the lumatransform unit (LTU) 9126 may be considered the ‘transform unit’ as itoccupies the same spatial region as the transform unit 952. The chromatransform units (CTUs) 9116 9118 and 9120-9122 (if present) may beconsidered as chroma sub-regions resulting from the inferred split.

Advantageously, both the methods 1300 and 1400 result in one transformskip flag being encoded for each colour channel, regardless of thepresence or absence of an inferred split operation (which may beapplicable when the 4:2:2 and the 4:4:4 chroma formats are in use). Thischaracteristic results in consistent behaviour with the 4:2:0 chromaformat, where one transform skip flag is present for each residualcoefficient array, and only one residual coefficient array is presentfor each colour channel for a given transform unit. For example, an 8×8transform unit in 4:2:0 would have an 8×8 transform for luma and a 4×4chroma transform for each chroma channel. One transform skip flag wouldbe present for each chroma channel in this case. In the 4:2:2 case, withan inferred split, two 4×4 chroma transforms would be present in eachchroma channel. A transform skip flag coded with the first 4×4 chromatransform but applied to both 4×4 chroma transforms would control thetransform skip status for the same spatial region as for the 4:2:0 case.This consistent behaviour results in the transform skip handling for4:2:2 that is backward compatible with the 4:2:0 case (i.e. norearrangement of syntax elements occurs in 4:2:0 due to supportingtransform skip in 4:2:2). Having a common transform skip for all chromaresults in an inferred split that avoids artificially dividing atransform unit into an upper half and a lower half for the purposes ofspecifying the transform skip.

FIG. 15 is a schematic representation showing possible arrangements of4×4 transforms in a 4×4 and an 8×8 transform unit, for the video encoder114 and the video decoder 134. The colour channels, Y, U and V aredepicted in FIG. 15 in columns and three cases are depicted along rows.In all depicted cases the video encoder 114 and the video decoder 134are configured to use a 4:2:2 chroma format. Also, in all cases, thevideo encoder 114 and the video decoder 134 support an inferred split ofthe 4×8 chroma region into two 4×4 chroma regions, and thus two 4×4chroma transforms are depicted for each colour channel. The three casesdepicted are:

Case 1: an 8×8 transform unit (TU) (upper row);

Case 2: four 4×4 transform units (TUs) with a first ordering (order 1)of the transforms (middle row); and

Case 3: four 4×4 transform units (TUs) with a second ordering (order 2)of the transforms (lower row).

For each case, the transforms are numbered in the order in which theyappear in the encoded bitstream 312. Case 1 shows a transform unit (TU)with an 8×8 luma transform and two 4×4 transforms, for each chromachannel. The luma transform does not have a transform skip flag as theluma transform is 8×8. Cases 2 and 3 further illustrate the case wherethe four 4×4 transforms units result in chroma regions for each chromatransform that span multiple transform units (TUs). In Cases 2 and 3,the four transform units (TUs) are numbered from zero to three andindexed with a ‘blkIdx’ variable, as used in the high efficiency videocoding (HEVC) standard under development. For each transform depicted inFIG. 15, if a transform skip is supported, a box is included in theupper-left corner of the transform. For transforms where the transformskip flag is always explicitly coded, the box is shaded (such as shadedbox 1502). An unshaded box (such as unshaded box 1504) illustrates thecase where the transform skip flag for the present transform is derivedfrom an earlier (such as an above transform). Implementations which donot support this derivation will explicitly code a transform skip flagin the encoded bitstream 312 for transforms with unshaded boxes. In Case2 and Case 3, a transform unit syntax structure, such as the transformunit syntax structure 952, is invoked four times (with the value for‘blkIdx’ incrementing from zero to three), once for each 4×4 transformunit. Thus four instances of the transform unit syntax structure arepresent in the encoded bitstream 312. On each invocation, a lumaresidual block, such as the luma residual data block 954, is present inthe encoded bitstream 312 if a corresponding coded block flag, such asthe coded block flag 950, has a value of one. In Case 2, on the fourthinvocation (‘blkIdx’ is equal to three), chroma residual blocks for thechroma channels, such as the chroma residual blocks 956, 958, 960, 962,are coded in the encoded bitstream 312 (if corresponding coded blockflags, such as the coded block flags 942, 944, 946, 948 have a value ofone). The ordering of the luma and chroma residual blocks from FIG. 9Bcorresponds to the ordering of transforms presented in Case 2. In Case3, the ordering is changed due to the following: Chroma residual blocksfor the upper half (such as the chroma residual blocks 956, 960) areprocessed on the second invocation of the transform unit syntaxstructure (i.e. when ‘blkIdx’ is equal to one) and chroma residualblocks for the lower half (such as the chroma residual blocks 958, 962)are processed on the fourth invocation of the transform unit syntaxstructure (i.e. when ‘blkIdx’ is equal to three).

Another case, not illustrated in FIG. 15, is that of a 4×4 transformunit when the 4:2:0 chroma format is in use, where one 4×4 transform forchroma is applied to the area on the chroma sample grid that correspondsto the four 4×4 transform units for luma at the same quad-treehierarchical level (collectively occupying an 8×8 region on the lumasample grid). When a 4×8 transform is available in chroma, transformskip for the 4:2:2 case is applied to the 4×8 transform (in addition tothe 4×4 transform), as described with reference to FIG. 18 below. When a4×8 transform is not available in chroma and the 4:2:2 chroma format isin use, implementations must use two 4×4 transform for each chromachannel and may code the transform skip flag for one 4×4 transform, suchas the upper 4×4 transform, but apply the coded transform skip flag forboth 4×4 transform for the given chroma channel.

FIG. 17 is a schematic flow diagram showing a method 1700 for decodingresidual data for a transform unit (TU), elaborating upon aspects of themethod 1400 of FIG. 14. The method 1700 determines a transform skip flagfor a given region and decodes the residual data for the region. Whenthe method 1700 is invoked for the luma channel of a transform unit(TU), only one region exists. For a single chroma channel of a transformunit (TU) and when an inferred split occurs, two regions are present andthe method 1700 is invoked for each region having a coded block flagvalue of one. The method 1700 begins with a transform skip supportedtest step 1702. The step 1702 tests a transform skip enabled flag and acoding unit transform quantisation bypass flag and the transform sizefor the present region. The transform skip enabled flag, encoded in theencoded bitstream 312, indicates if the transform skip function isavailable in the encoded bitstream 312. The coding unit transformquantisation bypass flag, encoded in the encoded bitstream 312,indicates if a ‘lossless’ coding mode was selected by the video encoder114, whereby both the transform 320 and the quantisation modules 322 arebypassed, and thus the video encoder 114 operates in a lossless mode,allowing the video decoder 134 to exactly reproduce captured frame datafrom the video source 112. The transform size for the present region,indicated by a ‘log 2TrafoSize’ variable in the high efficiency videocoding (HEVC) standard under development, which is defined as the log 2of the side dimension of a square transform. When transform skip flag istrue (i.e. enabled) and coding unit transform quantisation bypass flagis false (i.e. not enabled) and the transform size is 4×4 (i.e. log2TrafoSize is equal to 2), control passes to a first true coded blockflag (CBF) region in a colour channel test step 1704, otherwise controlpasses to a decode residual data step 1712. The test step 1704determines if the present region is the first region in the colourchannel (and in the transform unit (TU) to have a coded block flag (CBF)value of one). As the method 1700 is only invoked if the value of thecoded block flag for the present region is one, two cases are possible.If the method 1700 is invoked for the first chroma region (the upperregion when a 4:2:2 chroma format is in use, e.g. the region 682 or 666in FIG. 6C) of an inferred split, then the test step 1704 evaluates astrue and control passes to a decode transform skip flag step 1706. Ifthe method 1700 is invoked for the subsequent chroma region(s) of aninferred split (the lower region when a 4:2:2 chroma format is in use,e.g. the region 684 or 668 in FIG. 6C), the test step 1704 evaluates asfalse when the method 1700 was previously invoked for the first chromaregion (for the present transform unit) and true when the method 1700was not previously invoked for the first chroma region (for the presenttransform unit). When the test step 1704 evaluates as true, controlpasses to a decode transform skip flag step 1706. In the step 1706, theentropy decoder 420 decodes a transform skip flag from the encodedbitstream 312 to determine a transform skip flag value. A storetransform skip flag value step 1708 stores the transform skip flag valuein memory, such as hardware registers or registers 246, for later use onsubsequent invocations of the method 1700. If the test step 1704evaluates as false, control passes to a retrieve transform skip flagvalue step 1710, where the transform skip flag value, determined andstored on a previous invocation of the method 1700, is retrieve frommemory, such as hardware registers or registers 246. At a decoderesidual data step 1712, a block of residual data, such as residual datablock 954, 956, 958, 960 or 962 is decoded from the encoded bitstream312 by the entropy decoder 420. The determined transform skip flag valueis passed as the transform skip flag value 468 to control the transformskip operation, as described above with reference to the multiplexer423. The steps 1702-1710 correspond to the step 1402 of FIG. 14 when themethod 1700 is invoked for the luma channel, and the steps 1702-1710correspond to the step 1404 of FIG. 14 when the method 1700 is invokedfor a chroma channel. The decode residual data step 1712 corresponds tothe luma residual decoding of the step 1406 of FIG. 14 and the chromaresidual decoding of the step 1406 of FIG. 14. The method 1700 alsocorresponds to a ‘residual coding’ syntax structure, as defined in thehigh efficiency video coding (HEVC) standard under development.

FIG. 18 is a schematic representation 1800 showing a transform skipoperation applied to a 4×8 chroma region (with a 4×8 non-squaretransform) for each colour channel. The luma channel (‘Y’) and eachchroma channel (‘U’ and ‘V’) are depicted in FIG. 18. Two cases aredepicted in FIG. 18:

Case 1: ‘8×8 TU’ (the upper row of FIG. 18) depicts an 8×8 transformunit (TU), with an 8×8 transform 1802 for the luma channel and a 4×8(non-square or rectangular) transform 1804 for each chroma channel. Atransform skip flag is depicted with a shaded box in the upper rightcorner of a transform for which the transform skip operation issupported. In this case, the transform skip operation is also supportedin the 4×8 transform case (in addition to the 4×4 transform case) andthus the 4×8 transforms each include a transform skip flag 1806, asillustrated in FIG. 18.

Case 2: ‘Four 4×4 TUs’ (the lower row of FIG. 18) depicts four 4×4transform units (TUs), with four 4×4 transforms 1808 for the lumachannel and a 4×8 (non-square or rectangular) transform 1810 for eachchroma channel. The 4×8 transform for each chroma channel is collocated(on the chroma sample grid) with the luma transform (on the luma samplegrid) and shared among the four 4×4 transform units (TUs). In thisimplementation, the transform skip operation is also supported in the4×8 transform case (in addition to the 4×4 transform case) and thus the4×8 transforms include a transform skip flag 1812, as illustrated inFIG. 18.

For an implementation supporting Cases 1 and 2 of FIG. 18, a modifiedtest step 1702 and steps 1706 and 1712 of the method 1700 are performedby the video decoder 134. The modified test step 1702 operates as thetest step 1702 of FIG. 17, except that a transform size of 4×8 isincluded (in addition to a transform size 4×4) as a possible transformsize for which a transform skip operation is supported, thus allowingthe modified test step 1702 to evaluate as true in both the 4×4 and 4×8transform cases.

Appendix A illustrates possible ‘text’ for the high efficiency videocoding (HEVC) standard under development that is relevant to the syntaxstructure 900 and the syntax structure 930. Each instance of atransform_tree( ) function in appendix A is depicted as a portion of thesyntax structure labelled ‘TT’ in FIGS. 9A and 9C and each instance of atransform_unit( ) function in Appendix A is depicted as a portion of thesyntax structure labelled ‘TU’ in FIGS. 9A and 9B. The text provided inAppendix A is one example of text that accords with the syntaxstructures 900 and 930 and other examples are possible. Text thataccords with the syntax structures 900 and 930 implies that the videoencoder 114 performs the method 1000 to encode a bitstream and the videodecoder 134 performs the method 1100 to decode the bitstream.

Appendix B illustrates possible text for the high efficiency videocoding (HEVC) standard under development that is relevant to the syntaxstructure 9100 and the syntax structure 9130. Each instance of atransform_tree( ) function in appendix B is depicted as a portion of thesyntax structure labelled ‘TT’ in FIGS. 9C, 9D and 9E and each instanceof a transform_unit( ) function in appendix A is depicted as a portionof the syntax structure labelled ‘TU’ in FIGS. 9C, 9D and 9E. The textprovided in Appendix B is one example of text that accords with thesyntax structures 9100 and 9130 and other examples are possible. Textthat accords with the syntax structures 9100 and 9130 also implies thatthe video encoder 114 performs the method 1000 to encode a bitstream andthe video decoder 134 performs the method 1100 to decode the bitstream.

The text in Appendix A and Appendix B result in an implementationwhereby the 32×32 chroma region encountered in a transform unit (TU) ofsize 32×32 configured for the 4:4:4 chroma format results in (a maximumnumber of) four 16×16 chroma transforms being applied, and the 16×32chroma region encountered in a transform unit (TU) of size 32×32configured for the 4:2:2 chroma format results in (a maximum number of)two 16×16 chroma transforms being applied. The implementation resultingfrom the text in Appendix A and Appendix B, when applied to transformunits (TUs) of smaller size and configured for the 4:2:2 chroma format,(a maximum of) one chroma transforms is applied. For example, an 8×16transform is applied to an 8×16 chroma region and a 4×8 transform isapplied to a 4×8 chroma region.

INDUSTRIAL APPLICABILITY

The arrangements described are applicable to the computer and dataprocessing industries and particularly for the digital signal processingfor the encoding a decoding of signals such as video signals.

The foregoing describes only some embodiments of the present invention,and modifications and/or changes can be made thereto without departingfrom the scope and spirit of the invention, the embodiments beingillustrative and not restrictive.

(Australia only) In the context of this specification, the word“comprising” means “including principally but not necessarily solely” or“having” or “including”, and not “consisting only of”. Variations of theword “comprising”, such as “comprise” and “comprises” havecorrespondingly varied meanings.

Appendix A Transform_Tree( ) and Transform_Unit( ) Implement theInferred Chroma Split Using a Loop Construct

7.3.11 Transform tree syntax transform_tree( x0, y0, xBase, yBase,log2TrafoSize, trafoDepth, blkIdx ) { Descriptor  if( log2TrafoSize <=Log2MaxTrafoSize &&   log2TrafoSize > Log2MinTrafoSize &&   trafoDepth <MaxTrafoDepth && !(IntraSplitFlag && trafoDepth = = 0) )  split_transform_flag[ x0 ][ y0 ][ trafoDepth ] ae(v)  if( trafoDepth== 0 | | log2TrafoSize > 2) {   if( trafoDepth == 0 | | cbf_cb[ xBase ][yBase ][ trafoDepth − 1 ] ) {    for( tIdx = 0; tIdx < TrafoCrCbCnt;tIdx++ ) {     cbf_cb[ x0 + ( ( 1 << log2CrCbTrafoHorSize ) * (tIdx modae(v) TrafoCrCbHorCnt) ][ y0 + ( 1 << log2CrCbTrafoVertSize ) * ( tIdxdiv TrafoCrCbVertCnt) ) ][ trafoDepth + (TrafoCrCbCnt > 1) ]    }    cbf_cb[ x0 ][ y0 ][ trafoDepth ] |= (TrafoCrCbCnt > 1)   }   if(trafoDepth = = 0 | | cbf_cr[ xBase ][ yBase ][ trafoDepth − 1 ] ) {   for( tIdx = 0; tIdx < TrafoCrCbCnt; tIdx++ ) {    cbf_cr[ x0 + ( ( 1<< log2CrCbTrafoHorSize ) * (tIdx mod ae(v) TrafoCrCbHorCnt) ][ y0 + ( 1<< log2CrCbTrafoVertSize ) * (tIdx div TrafoCrCbVertCnt) ) ][trafoDepth + (TrafoCrCbCnt > 1) ]    }    cbf_cr[ x0 ][ y0 ][ trafoDepth] |= (TrafoCrCbCnt > 1)   }  }  if( split_transform_flag[ x0 ][ y0 ][trafoDepth ] ) {   x1 = x0 + ( ( 1 << log2TrafoSize ) >> 1)   y1 = y0 +( ( 1 << log2TrafoSize ) >> 1)   transform_tree( x0, y0, x0, y0,log2TrafoSize − 1, trafoDepth + 1, 0)   transform_tree( x1, y0, x0, y0,log2TrafoSize − 1 trafoDepth + 1, 1)   transform_tree( x0, y1, x0, y0,log2TrafoSize − 1, trafoDepth + 1, 2)   transform_tree( x1, y1, x0, y0,log2TrafoSize − 1, trafoDepth + 1, 3)  } else {   if( PredMode[ x0 ][ y0] = = MODE_INTRA | | trafoDepth != 0 | |     cbf_cb[ x0 ][ y0 ][trafoDepth ] | | cbf_cr[ x0 ] [ y0 ][ trafoDepth ] )    cbf_luma[ x0 ][y0 ][ trafoDepth ] ae(v)   transform_unit (x0, y0, xBase, yBase,log2TrafoSize, trafoDepth, blkIdx)  } }

7.3.12 Transform unit syntax transform_unit( x0, y0, xBase, yBase,log2TrafoSize, trafoDepth, blkIdx ) { Descriptor  if( cbf_luma[ x0 ][ y0][ trafoDepth ] | | cbf_cb[ x0 ][ y0 ][ trafoDepth ] | |   cbf_cr[ x0 ][y0 ][ trafoDepth ] ) {   if( cu_qp_delta_enabled_flag &&!IsCuQpDeltaCoded ) {    cu_qp_delta_abs ae(v)    if( cu_qp_delta_abs )    cu_qp_delta_sign ae(v)   }   if( cbf_luma[ x0 ][ y0 ][ trafoDepth ])    residual_coding( x0, y0, log2TrafoSize, 0)   if( log2TrafoSize > 2){    if( cbf_cb[ x0 ][ y0 ][ trafoDepth ] )     for ( tIdx = 0; tIdx <TrafoCrCbCnt; tIdx++ ) {      residual_coding( x0 + ( ( 1 <<log2CrCbTrafoHorSize ) * (tIdx mod TrafoCrCbHorCnt), y0 + ( 1 <<log2CrCbTrafoVertSize ) * (tIdx div TrafoCrCbVertCnt) ), log2TrafoSize,1 )     }    if( cbf_cr[ x0 ][ y0 ][ trafoDepth ] )     for ( tIdx = 0;tIdx < TrafoCrCbCnt; tIdx++ ) {      residual_coding( x0 + ( ( 1 <<log2CrCbTrafoHorSize ) * (tIdx mod TrafoCrCbHorCnt), y0 + ( 1 <<log2CrCbTrafoVertSize ) * (tIdx div TrafoCrCbVertCnt) ), log2TrafoSize,2 )     }   } else if( blkIdx = = 3) {    if( cbf_cb[ xBase ][ yBase ][trafoDepth] )     residual_coding( xBase, yBase, log2TrafoSize, 1)   if( cbf_cr[ xBase ][ yBase ][ trafoDepth ] )     residual_coding(xBase, yBase, log2TrafoSize, 2)   }  } }

7.4.8.1 General Coding Unit Semantics

The variables TrafoCrCbHorCnt and TrafoCrCbVertCnt are derived asfollows:

-   -   If log 2TrafoSize is equal to 5 and split_transform_flag is        equal to 0, TransformIdxMax is derived as follows:        -   If chroma_format_idc is equal to 1, TrafoCrCbHorCnt and            TrafoCrCbVertCnt are equal to 1.        -   If chroma_format_idc is equal to 2, TrafoCrCbHorCnt is equal            to 1 and TrafoCrCbVertCnt is equal to 2.        -   Otherwise, if chroma_format_idc is equal to 3,            TrafoCrCbHorCnt and TrafoCrCbVertCnt are equal to 2.    -   Otherwise, TrafoCrCbHorCnt and TrafoCrCbVertCnt are equal to 1.

The variable TrafoCrCbCnt is derived asTrafoCrCbHorCnt*TrafoCrCbVertCnt.

The variables log 2CrCbTrafoHorSize and log 2CrCbTrafoVertSize arederived as follows:

-   -   If chroma_format_idc is equal to 1, log 2CrCbTrafoHorSize and        log 2CrCbTrafoVertSize are equal to log 2TrafoSize−1.    -   Otherwise, if chroma_format_idc is equal to 2, log        2CrCbTrafoHorSize is equal to log 2TrafoSize and log        2CrCbTrafoVertSize is equal to min(log 2TrafoSize−1, 4).    -   Otherwise, if chroma_format_idc is equal to 3, log        2CrCbTrafoHorSize and log 2CrCbTrafoVertSize are equal to        min(log 2TrafoSize, 4).

End Appendix A Appendix B Invoke Transform_Tree( ) Once Per Pair ofChroma Channels for Each Chroma Transform Resulting from the InferredSplit

7.3.11 Transform tree syntax transform tree( x0, y0, xBase, yBase,log2TrafoSize, trafoDepth, blkIdx, Descriptor chromaOnly ) {  if(log2TrafoSize <= Log2MaxTrafoSize &&   log2TrafoSize > Log2MinTrafoSize&&   trafoDepth < MaxTrafoDepth && !(IntraSplitFlag && trafoDepth = = 0)  && !chromaOnly )   split_transform_flag[ x0 ][ y0 ][ trafoDepth ]ae(v)  if( trafoDepth = = 0 | | log2TrafoSize > 2) {   if( trafoDepth == 0 | | cbf_cb[ xBase ][ yBase ][ trafoDepth − 1 ] )    if(TrafoCrCbCnt > 1) {     cbf_cb[ x0 ][ y0 ][ trafoDepth ] = 1    } else {    cbf_cb[ x0 ][ y0 ][ trafoDepth ] ae(v)    }   if( trafoDepth = = 0 || cbf_cr[ xBase ][ yBase ][ trafoDepth − 1 ] )    if( TrafoCrCbCnt > 1){     cbf_cr[ x0 ][ y0 ][ trafoDepth ] = 1    } else {     cbf_cr[ x0 ][y0 ][ trafoDepth ] ae(v)    }  }  if( split_transform_flag[ x0 ][ y0 ][trafoDepth ] || TrafoCrCbCnt > 1) {   x1 = x0 + ( ( 1 << log2TrafoSize) >> 1)   y1 = y0 + ( ( 1 << log2TrafoSize ) >> 1)   transform_tree( x0,y0, x0, y0, log2TrafoSize − 1, trafoDepth + 1, 0, TrafoCrCbCnt > 1)  if( chroma_format_idc !=2 ) {    transform_tree( x1, y0, x0, y0,log2TrafoSize − 1 trafoDepth + 1, 1, TrafoCrCbCnt > 1)   }  transform_tree( x0, y1, x0, y0, log2TrafoSize − 1, trafoDepth + 1, 2,TrafoCrCbCnt > 1)   if( chroma_format_idc !=2 ) {   transform_tree( x1,y1, x0, y0, log2TrafoSize − 1, trafoDepth + 1, 3, TrafoCrCbCnt > 1)   } }   

 if( !split_transform_flag[ x0 ][ y0 ][ trafoDepth ] && TrafoCrCbCnt > 1) {   if( ( PredMode[ x0 ][ y0 ] = = MODE_INTRA | | trafoDepth !=0 | |    cbf_cb[ x0 ][ y0 ][ trafoDepth ] | | cbf_cr[ x0 ][ y0 ][ trafoDepth] ) && !chromaOnly)    cbf_luma[ x0 ][ y0 ][ trafoDepth ] ae(v)  transform unit (x0, y0, xBase, yBase, log2TrafoSize, trafoDepth,blkIdx, chromaOnly )  } }

7.3.12 Transform unit syntax transform_unit( x0, y0, xBase, yBase,log2TrafoSize, trafoDepth, blkIdx, Descriptor chromaOnly ) {  if(cbf_luma[ x0 ][ y0 ][ trafoDepth ] | | cbf_cb[ x0 ][ y0 ][ trafoDepth ]| |   cbf_cr[ x0 ][ y0 ][ trafoDepth ] ) {   if(cu_qp_delta_enabled_flag && !IsCuQpDeltaCoded && !chromaOnly ) {   cu_qp_delta_abs ae(v)    if( cu_qp_delta_abs )     cu_qp_delta_signae(v)   }   if( cbf_luma[ x0 ][ y0 ][ trafoDepth ] )    residual_coding(x0, y0, log2TrafoSize, 0 )   if( log2TrafoSize > 2) {    if( cbf_cb[ x0][ y0 ][ trafoDepth ] )     residual_coding( x0, y0, log2TrafoSize, 1 )   if( cbf_cr[ x0 ][ y0 ][ trafoDepth ] )     residual_coding( x0, y0,log2TrafoSize, 2 )   } else if( blkIdx = = 3) {    if( cbf_cb[ xBase ][yBase ][ trafoDepth ] )     residual_coding( xBase, yBase,log2TrafoSize, 1 )    if( cbf_cr[ xBase ][ yBase ][ trafoDepth ] )    residual_coding( xBase, yBase, log2TrafoSize, 2 )   }  } }

7.4.8.1 General Coding Unit Semantics

The variables TrafoCrCbHorCnt and TrafoCrCbVertCnt are derived asfollows:

-   -   If log 2TrafoSize is equal to 5 and split transform flag is        equal to 0, TransformIdxMax is derived as follows:        -   If chroma_format_idc is equal to 1, TrafoCrCbHorCnt and            TrafoCrCbVertCnt are equal to 1.        -   If chroma_format_idc is equal to 2, TrafoCrCbHorCnt is equal            to 1 and TrafoCrCbVertCnt is equal to 2.        -   Otherwise, if chroma_format_idc is equal to 3,            TrafoCrCbHorCnt and TrafoCrCbVertCnt are equal to 2.    -   Otherwise, TrafoCrCbHorCnt and TrafoCrCbVertCnt are equal to 1.

The variable TrafoCrCbCnt is derived asTrafoCrCbHorCnt*TrafoCrCbVertCnt.

End of Appendix B

1. A method of inverse transforming a plurality of residual coefficientarrays from a video bitstream configured for a 4:2:2 format, the methodcomprising: decoding up to four luma residual coefficient arrays,wherein each luma residual coefficient array corresponds to onerespective 4×4 luma block of our 4×4 luma blocks, of the four 4×4 lumablocks collectively occupying an 8×8 luma region; decoding, after the upto four luma residual coefficient arrays are decoded, up to two chromaresidual coefficient arrays for a first chroma channel, wherein a firstone of the chroma residual coefficient arrays for the first chromachannel corresponds to a 4×4 chroma block associated with an upperportion of the 8×8 luma region and a second one of the chroma residualcoefficient arrays for the first chroma channel corresponds to a 4×4chroma block associated with a lower portion of the 8×8 luma region andthe second one of the chroma residual coefficient arrays for the firstchroma channel is decoded consecutively after the first one of thechroma residual coefficient arrays for the first chroma channel isdecoded and wherein the first one of the chroma residual coefficientarrays for the first chroma channel and the second one of the chromaresidual coefficient arrays for the first chroma channel are decodeddependent on respective coded block flag values; decoding, after the upto two chroma residual coefficient arrays for the first chroma channelare decoded, up to two chroma residual coefficient arrays for a secondchroma channel, wherein a first one of the chroma residual coefficientarrays for the second chroma channel corresponds to a 4×4 chroma blockassociated with the upper portion of the 8×8 luma region, and a secondone of the chroma residual coefficient arrays for the second chromachannel corresponds to a 4×4 chroma block associated with the lowerportion of the 8×8 luma region, and the second one of the chromaresidual coefficient arrays for the second chroma channel is decodedconsecutively after the first one of the chroma residual coefficientarrays for the second chroma channel is decoded wherein the first one ofthe chroma residual coefficient arrays for the second chroma channel andthe second one of the chroma residual coefficient arrays for the secondchroma channel are decoded dependent on respective coded block flagvalues; and applying a 4×4 inverse transform to each of the decoded upto four luma residual coefficient arrays, each of the decoded up to twochroma residual coefficient arrays for the first chroma channel, andeach of the decoded up to two chroma residual coefficient arrays for thesecond chroma channel.